June 19, 2021 40 min to read

[통계계산] 9. Random Number Generation

Computational Statistics

[통계계산] 9. Numerical Integration

목차

Computer Arithmetic

LU decomposition

Cholesky Decomposition

QR Decomposition

Interative Methods

Non-linear Equations

Optimization

Numerical Integration

Random Number Generation

Monte Carlo Integration

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scatterplot3d_0.3-41

loaded via a namespace (and not attached):
 [1] fansi_0.5.0       digest_0.6.27     utf8_1.2.1        crayon_1.4.1     
 [5] IRdisplay_1.0     repr_1.1.3        lifecycle_1.0.0   jsonlite_1.7.2   
 [9] evaluate_0.14     pillar_1.6.1      rlang_0.4.11      uuid_0.1-4       
[13] vctrs_0.3.8       ellipsis_0.3.2    IRkernel_1.1      tools_3.6.3      
[17] compiler_3.6.3    base64enc_0.1-3   pbdZMQ_0.3-5      htmltools_0.5.1.1

# install.packages("ggplot2")

library(scatterplot3d)
library(ggplot2)

Registered S3 methods overwritten by 'tibble':
  method     from  
  format.tbl pillar
  print.tbl  pillar

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang

Random number generation

Uniform random number generation

Goal

To generate $U_i \stackrel{iid}{\sim} \text{unif}(0, 1)$.

Basis for all other random number generation
Fact: NO RANDOM NUMBER IN COMPUTER — only look random (PSEUDO-RANDOM)

Congruential generator

\begin{align} x_{i+1} &= a x_i \mod m, \quad i=1, 2, \dotsc
u_i &= x_i / m. \end{align}

Modulus $m$ is a large prime number.
Multiplier $a$ is a positive integer between 2 and $m-1$.
Map $f: x \mapsto ax \mod m$ maps ${1, \dotsc, m-1}$ onto itself and is one-to-one:
- Suppose $x\in{1, \dotsc, m-1}$. Then $f(x) \neq 0$ since $a$ and $m$ are relatively prime hence $ax$ is not a multiple of $m$. Thus $f$ maps ${1, , \dotsc, m-1}$ to ${1, , \dotsc, m-1}$.
- If $ay = ax \mod m$, then $a(y - x) = m k$ for some integer $k$. Since $a$ and $m$ are relatively prime, $y - x = m l$ for some interger $l$. That is, $x = y \mod m$. Hence map $f$ is one-to-one.
- Since $f$ maps ${1, , \dotsc, m-1}$ to ${1, , \dotsc, m-1}$ and one-to-one, $f$ is also onto.
Note \begin{align} x_1 &= a x_0 \mod m
x_2 &= a x_1 \mod m = a^2 x_0 \mod m
x_3 &= a x_2 \mod m = a^3 x_0 \mod m
& \vdots
x_n &= a x_{n-1} \mod m = a^n x_0 \mod m \end{align} Hence if $a^n = 1 \mod m$ for some $n$, then $x_n = x_0 \mod m$ and the (pseudo)random number generator repeats the sequence. The number $n$ is called the period of the RNG, and $x_0$ is called the seed.

set.seed(2020)  # set 2020th seed; does not mean x_0 = 2020
runif(5)
set.seed(2020)  # same seed results in same "random" sequence
runif(5)

Primitive root of unity
- Fermat’s little theorem: $a^{m} = a \mod m$.
- Since $a$ is not divisible by $m$, we have $a^{m-1} = 1 \mod m$.
- Thus the period $n$ satisfies $n \le m - 1$.
- Primitive root of unity: $a$ such that $n = m - 1$
- If $a$ is primitive, then $x_1, x_2, \dotsc, x_{m-1}$ is a permutation of ${1, 2, \dotsc, m-1}$.
- For $m=2^{31} - 1$ (Mersenne prime), $a=7^5 = 16807$ is primitive, leading to the period of $2^{31} - 2$ = 2,147,483,646. This RNG was used in early versions of MATLAB (up to v4).

Good RNGs should have long periods, and should give samples which appear to be drawn from a uniform distribution. If the size of the sample is much less than the period of the RNG, then the sample should appear random.

Autocorrelation

Ideally, a pseudorandom number sequence should look i.i.d. If we take the first $p$ values to form an $p$-dimensional vector, the second $p$ values to form a second $p$-dimensional vector, etc, then these vectors should fill the $p$-dimensional hypercube uniformly.
However, by construction the sequence generated by congruential generators depends on the previous value, hence tends to be correlated.
It is known that congruential generators tend to give $p$-dimensional vectors that concentrate lower-dimensional hyperplanes, for some $p$.

# IBM System/360 RANDU generator
# a = 2^16 + 3 = 65539
# m = 2^31
n <- 2000
a <- 65539
m <- 2^31
u <- vector(length=n)
u[1] <- 1
for (i in seq_len(n-1)) {
    u[i+1] <- a * u[i] %% m
}

scatterplot3d(u[1:(n-2)], u[2:(n-1)], u[3:n], angle=160)

# Early MATLAB RNG
# a = 7^5
# m = 2^31 - 1
n <- 2000
a2 <- 7^5
m2 <- 2^31 - 1
v <- vector(length=n)
v[1] <- 1
for (i in seq_len(n-1)) {
    v[i+1] <- a2 * v[i] %% m2
}

scatterplot3d(v[1:(n-2)], v[2:(n-1)], v[3:n], angle=20)

A simple modification is to introduce shuffling in the sequence, which we won’t cover in detail.

R’s RNG

R uses the Mersenne-Twister as the default RNG. This RNG was developed by Matsumoto and Nishimura in 1998, and is the first algorithm whose period ($2^{19937} - 1$) exceeds the number of electron spin changes since the creation of the Universe ($10^{6000}$ against $10^{120}$)!
Mersenne-Twister guarantees 623 consecutive dimensions to be equidistributed (over the whole period).

RNGkind()

<ol class=list-inline><li>‘Mersenne-Twister’</li><li>‘Inversion’</li><li>‘Rejection’</li></ol>

Transformation methods

From now on, we assume the the problem of generating uniform random numbers has been solved for practical purposes.

Inverse CDF method

For a random variable $X$, let $F$ be its cumulative distribution function (CDF), that is, $F(x) = P(X \le x)$. Recall that $F$ is right-continuous and nondecreasing. Also, if $F$ is strictrly increasing, random variable $F(X)$ is uniformly distributed on $[0, 1]$. Below, we generalize this result.

The inverse CDF of $X$ is defined as $F^{-1}(u) = \inf\{x: F(x) \ge u\}, ,$ which coincides the usual inverse of $F$ if $F$ is strictly increasing.

Proposition 1. Let $X$ be a random variable with CDF $F$. Then the following holds.

If $F$ is continuous, then $U = F(X)$ is a uniform random variable on $[0, 1]$.
If $F$ is not continous, then $P[F(X) \le y] \le y$ for all $y \in [0, 1]$.
If $U$ is uniform on $[0, 1]$, then $F^{-1}(U)$ has CDF $F$.

Proof.

Part 1: We will show that $P[F(X) \le F(t)] = F(t) \tag{*}$ for any $t$. Suppose for now this is true.

Let $u \in (0, 1)$. Then by continuity of $F$, there is $y$ such that $F(y) = u$. By (*), $P[F(X) \le u] = P[F(X) \le F(y)] = F(y) = u.$

Part 3: It suffices to show that ${F^{-1}(U) \le t} = {U \le F(t)}$. If $F^{-1}(u)=\inf{x: F(x) \ge u} \le t$, then by the monotonicity and right-continuity of $F$, the set ${x: F(x) \ge u}$ is an half-closed interval containing its left endpoint, which is $F^{-1}(u)$. Hence $F(F^{-1}(u)) \ge u$. Since $F^{-1}(u) \le t$, again by monotonicity of $F$, it follows that $u \le F(F^{-1}(u)) \le F(t)$. Conversely, if $u \le F(t)$, then by definition $F^{-1}(u) \le t$.
Part 2: by part 3, $X\stackrel{d}{=}F^{-1}(U)$, where $U$ is uniform on $[0, 1]$. Since $x=F^{-1}(u)$ implies $u \le F(x)$, $P[F(X) \le y] \le P(U \le y) = y.$
It remains to show (). Monotonicity of $F$ yields ${X > t} \subset {F(X) \ge F(t)}$. Hence ${X > t } \cap {F(X) < F(t) } = \emptyset$. Likewise ${X \le t} \cap {F(X) > F(t)} = \emptyset}$. Then, \begin{align} {X > t} \cap {F(X) \le F(t)} &= [{X > t} \cap {F(X) < F(t) }] \cup [{X > t} \cap {F(X) = F(t)}]
&= {X > t} \cap {F(X) = F(t)}
{X \le t} \cap {F(X) \le F(t)} &= {X \le t} \cap {F(X) > F(t)}^c = {X \le t} \end{align*} So, $\{F(X) \le F(t) \} = \{X \le t\} \cup [\{X > t\} \cap \{F(X) = F(t)\}] =: A \cup B .$ Obviously events $A$ and $B$ are disjoint. However, event $B$ corresponds to the values $x$ of $X$ with which $F(x)$ is constant. Hence $P(B)=0$. Therefore, $P[F(X) \le F(t)] = P(A) + P(B) = P(A) = P(X \le t) = F(t).$

Exponential distribution

CDF $F(x) = 1 - e^{-\lambda x}$ yields $F^{-1}(u) = -\frac{1}{\lambda}\log u$ on $(0, 1)$.

# Exp(1)
n <- 1000
u <- runif(n)
x <- -log(u)
expdata <- data.frame(x)
plt <- ggplot(expdata, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

Cauchy distribution

From the density of the Cauchy distribution $f(x) = \frac{\beta}{\pi(\beta^2 + (x-\mu)^2)},$ its CDF is $F(x) = \frac{1}{2} + \frac{1}{\pi}\tan^{-1}\left(\frac{x-\mu}{\beta}\right)$. Thus $F^{-1}(u) = \mu + \beta\tan(\pi u - \pi/2) = \mu - \frac{\beta}{\tan(\pi u)} .$

# standard Cauchy (beta=1, mu=0)
n <- 1000
u <- runif(n)
x <- -1/tan(pi * u)
#hist(x, breaks=40)
cauchydata <- data.frame(x)
plt <- ggplot(cauchydata, aes(x=x)) +  
    geom_histogram(binwidth=10, fill="white", color="black") + 
    geom_density(aes(y=10 * ..count..), color="purple") + 
    xlim(-200, 200)
print(plt)

Warning message:
“Removed 4 rows containing non-finite values (stat_bin).”
Warning message:
“Removed 4 rows containing non-finite values (stat_density).”
Warning message:
“Removed 2 rows containing missing values (geom_bar).”

Discrete uniform distribution

$X \sim \text{unif}({1, 2, \dotsc, k})$. It is easy to verify $F(x) = \frac{1}{k}\lfloor x \rfloor$ for $x \in [0, n]$ and $F^{-1}(u)=\lceil ku \rceil$.

k <- 10
n <- 1000
u <- runif(n)
x <- ceiling(k * u)
table(x)

x
  1   2   3   4   5   6   7   8   9  10 
108 108  92  93  97  85  94 118 108  97 

Geometric distribution

If $X \sim \text{geom}(p)$, then its probability mass functin $p(x) = (1-p)^{x-1}p$.

For $Y \sim \text{Exp}(\lambda)$, \begin{align} P(\lceil Y \rceil = k) &= P(k-1 < Y \le k) = F_Y(k) - F_Y(k-1) = (1 - e^{-\lambda k}) - (1 - e^{-\lambda(k-1)})
&= e^{-\lambda(k-1)}(1 - e^{-\lambda})
&- (1 - p)^{k-1} p \end{align} if $\lambda$ satisfies $p = 1 - e^{-\lambda}$, or $\lambda = -\log(1-p)$.

For this $\lambda$, $X = \lceil Y \rceil = \lceil -\frac{1}{\lambda}\log U \rceil = \lceil \frac{\log U}{\log(1-p)}\rceil$.

gengeom <- function(p, nsamp=1) {
    u <- runif(nsamp)
    y <- log(u) / log(1 - p)
    ceiling(y)
}    
nsamp <- 1000
p <- 0.3
x <- gengeom(p, nsamp)
geomdata <- data.frame(x)
plt <- ggplot(geomdata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)

Normal random numbers

For $X \sim N(0, 1)$, inverse CDF $\Phi^{-1}$ does not have a closed form.

Box-Muller

Generates $X, Y \stackrel{iid}{\sim} N(0, 1)$.

Transforms the random Cartesian coordinates $(X, Y)$ to polar coorinates $(R, \Theta)$. Since $(X, Y)=(R\cos\Theta, R\sin\Theta)$, $\iint f_{XY}(x, y)dxdy = \frac{1}{2\pi}\exp(-\frac{x^2 + y^2}{2})dxdy = \iint \frac{1}{2\pi}\exp(-\frac{r^2}{2})rdrd\theta .$

Hence $R$ has density $f_R(r) = r\exp(-\frac{r^2}{2})$ and $\Theta$ is uninform on $[0, 2\pi]$. Since $P(R > \rho) = P(R^2 > \rho^2) = \int_\rho^{\infty} r\exp(-\frac{r^2}{2})dr = \exp(-\frac{1}{2}\rho^2),$ random variable $R^2$ is exponentially distributed with $\lambda = 1/2$.

Thus for independent $U, V \sim \text{unif}(0, 1)$, set $R = (-2\log U)^{1/2}, \quad \Theta = 2\pi V .$ Then $(X, Y) = (R\cos\Theta, R\sin\Theta)$ are independently $N(0, 1)$.

boxmuller <- function(nsamp) {
    n <- ceiling(nsamp / 2)
    u <- runif(n)
    v <- runif(n)
    r <- sqrt(-2 * log(u))
    theta <- 2 * pi * v
    x <- r * cos(theta)
    y <- r * sin(theta)
    samp <- c(x, y)
    samp[seq_len(nsamp)]
}
#hist(c(x, y))
n <- 1000
normdata1 <- data.frame(x = boxmuller(n))
plt <- ggplot(normdata1, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

Marsaglia

From the Box-Muller transformation, we learned that if $R^2 \sim \text{Exp}(1/2)$ and $\Theta\sim\text{unif}(0, 2\pi)$ and they are independent, $(R\cos\Theta, R\sin\Theta)$ are i.i.d. standard normal.
On the other hand, if a random point $(U, V)$ is uniformly distributed on the unit disk, and let $(U, V) = (T\cos\Phi, T\sin\Phi)$, then $P(T^2 \le t^2, 0 \le \Phi \le \phi) = \frac{t^2\phi}{2\pi}, \quad t^2 \le 1, ~ \phi \in (0, 2\pi) \\$ which means that $T^2 = U^2 + V^2 \sim \text{unif}(0, 1)$, $\Phi \sim \text{unif}(0, 2\pi)$, and they are independent.
Therefore, if we sample $(U, V)$ uniformly from the unit disk, and set $T = \sqrt{U^2 + V^2}$, $\cos\Phi = U/T$, $\sin\Phi=V/T$, then $T^2\sim\text{unif}(0,1)$ and $-2\log T^2$ is identically distributed to $R^2$, i.e., $\text{Exp}(1/2)$, and $\Phi$ is identically distributed to $\Theta$, i.e., uniform on $(0, 2\pi)$. Therefore, if we set $(X, Y)$ such that $X = \sqrt{-2\log T^2}\frac{U}{T}, \quad Y = \sqrt{-2\log T^2}\frac{V}{T},$ then $X, Y$ are i.i.d. standard normal.
One way to sample from the unit disk is to sample $(U, V)$ from $\text{unif}[-1, 1]\times\text{unif}[-1, 1]$, and discard the sample if $U^2 + V^2 > 1$ and resample (see acceptance-rejection sampling below).
Algorithm:
1. $U, V \stackrel{iid}{\sim} \text{unif}[-1, 1]$;
2. If $T = \sqrt{U^2 + V^2} > 1$, go to 1;
3. Set $X = \sqrt{-2\log T^2}\frac{U}{T}$ and $Y = \sqrt{-2\log T^2}\frac{V}{T}$.
This method avoids the trigonometric function evaluations of the Box-Muller, but uses $4/\pi$ as many random pairs on average.

marsaglia <- function(nsamp) {
    n <- ceiling(nsamp / 2)
    it <- 0
    x <- numeric(n)
    y <- numeric(n)
    while (it < n) {
        u <- 2 * runif(1) - 1
        v <- 2 * runif(1) - 1
        tau <- sqrt(u^2 + v^2)
        if (tau > 1) next
        x[it] <- sqrt(-4 * log(tau)) * u / tau
        y[it] <- sqrt(-4 * log(tau)) * v / tau
        it <- it + 1
    }
    samp <- c(x, y)
    samp[seq_len(nsamp)]    
}
n <- 1000
normdata2 <- data.frame(x = marsaglia(n))
plt <- ggplot(normdata2, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

Random numbers by definition

Bernoulli

Set the success probability $p$
Generate $U \sim \text{unif}(0, 1)$.
Let $X = \mathbb{I}_{{U \le p}}$.
Then $X \sim \text{Ber}(p)$.

Binomial

Set the success probability $p$
Generate $n$ independent $X_i \sim \text{Ber}(p)$.
Let $X_n = \sum_{i=1}^n X_i$.
Then $X_n \sim B(n, p)$.

Negative binomial

Set the success probability $p$
Generate $r$ independent $X_i \sim \text{Geom}(p)$.
Let $X_r = \sum_{i=1}^r X_i$.
Then $X_r \sim \text{NegBin}(r, p)$.

Poisson

Generate $U_i \stackrel{iid}{\sim} \text{unif}(0, 1)$.
Find $N$ such that $\prod_{i=1}^N U_i \ge e^{-\lambda} > \prod_{i=1}^{N+1} U_i$.
Then $N \sim \text{Poi}(\lambda)$.

This is because $T_i = -\frac{1}{\lambda}\log U_i \stackrel{iid}{\sim} \text{Exp}(\lambda)$, and is the waiting time between the $i-1$st and the $i$th event of the Poisson counting process $N(t) \sim \text{Poi}(\lambda t)$.
That is, $N \sim \text{Poi}(\lambda) \iff T_1 + \dotsc + T_N \le 1 < T_1 + \dotsc + T_N + T_{N+1}$.

## Binomial random number generation
genbin <- function(n, p) {
    u <- runif(n)
    x <- sum(u < p)
}
n <- 10; p <- 0.6
nsamp <- 1000
x <- numeric(nsamp)
for (i in seq_len(nsamp)) {
    x[i] <- genbin(n, p)
}
bindata <- data.frame(x)
plt <- ggplot(bindata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)

# Negative binomial random number generation
gengeom <- function(p, nsamp=1) {
    u <- runif(nsamp)
    y <- log(u) / log(1 - p)
    ceiling(y)
}    
nsamp <- 1000
p <- 0.6
r <- 5
x <- numeric(nsamp)
for (i in seq_len(r)) {
    x <- x + gengeom(p, nsamp)
}
negbindata <- data.frame(x)
plt <- ggplot(negbindata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)

# Poisson random number generation
genpoi <- function(lam, maxiter=1000) {
    u_cum <- 1.0
    k <- 0
    while (u_cum > exp(-lam) && k < maxiter ) {
        u <- runif(1)
        u_cum <- u_cum * u
        k <- k + 1
    }
    k
}
lam <- 3  # Poisson rate
nsamp <- 1000
x <- numeric(nsamp)
for (i in seq_len(nsamp)) {
    x[i] <- genpoi(lam)
}
poidata <- data.frame(x)
plt <- ggplot(poidata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)

Chi-square

Generate $Z_1, \dotsc, Z_{\nu} \stackrel{iid}{\sim} N(0, 1)$.
Let $X_{\nu} = \sum_{i=1}^{\nu} Z_i^2$.
Then $X_{\nu} \sim \chi^2(\nu)$.

Alternatively, for even $\nu$:

Generate $U_i \stackrel{iid}{\sim} \text{unif}(0, 1)$.
Let $X_{\nu} = -2\log(\prod_{i=1}^{\nu/2} U_i)$.
Then $X_{\nu} \sim \chi^2(\nu)$.

This is because $\chi^2(\nu) = \text{Gamma}(\nu/2, 2)$, where 2 is the scale parameter.

Student’s $t$

Generate $Z \sim N(0, 1)$ and $X \sim \chi^2(\nu)$ independently.
Let $T = Z / \sqrt{X/\nu}$.
Then $T \sim t(\nu)$.

$F$

Generate $X_1 \sim \chi^2(\nu_1)$ and $X_2 \sim \chi^2(\nu_2)$ independently.
Let $F = \frac{X_1/\nu_1}{X_2/\nu_2}.$
The $F \sim F(\nu_1, \nu_2)$.

## chi-square random number generation
genchisq1 <- function(nsamp, nu) {
    z <- matrix(rnorm(nsamp * nu), nrow=nsamp)
    rowSums(z^2)
}
nu <- 6
n <- 1000
chisqdata1 <- data.frame(x = genchisq1(n, nu))
plt <- ggplot(chisqdata1, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

## chi-square random number generation 2
genchisq2 <- function(nsamp, nu) {
    u <- matrix(runif(nsamp * nu / 2), nrow=nsamp)
    -2 * log(apply(u, 1, prod) )
}
nu <- 6
n <- 1000
chisqdata2 <- data.frame(x = genchisq2(n, nu))
plt <- ggplot(chisqdata2, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

## Student's t random number generation
gent <- function(nsamp, nu) {
    z <- rnorm(nsamp)
    chisq <- genchisq1(nsamp, nu)
    trv <- z / sqrt(chisq / nu)
}
nu <- 6
n <- 1000
tdata <- data.frame(x = gent(n, nu))
plt <- ggplot(tdata, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

# F random number generation
genF <- function(nsamp, nu1, nu2) {
    chisq1 <- genchisq1(nsamp, nu1)
    chisq2 <- genchisq1(nsamp, nu2)
    Frv <- chisq1 / nu1 / chisq2 * nu2
}
nu1 <- 10; nu2 <- 6
n <- 1000
Fdata <- data.frame(x = genF(n, nu1, nu2))
plt <- ggplot(Fdata, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

Acceptance-rejection sampling (or just rejection sampling)

Suppose we want to sample from a distribution with complicated density $f(x)$. It is not easy to use the above method since neither the cdf nor its inverse is analytically available.

John von Neumann, while working on the Mahanttan Project, suggested the following:

Find the “envelope” of $f$, i.e., a simple density $g(x)$ such that $f(x) \le c g(x) =: h(x)$ for all $x$, for some constant $c > 0$.
Sample a random variable $X$ distributed according to $g$.
Accept $X$ as a representative of $f$ if
- $U \le \displaystyle\frac{f(X)}{h(X)}$,
- where $U$ is a uniform random variable on $[0, 1]$ drawn independently.
Reject $X$ otherwise. Go to Line 2.

Why AR sampling works?

Definition (Body of a function). For a nonnegative, integrable function $f$, its body is defined and denoted by $B_f = \{(x, y): 0 \le y \le f(x) \}.$

Thus the volume $B_f$ is $ B_f = \int_{-\infty}^{\infty}f(x)dx = \int_{-\infty}^{\infty}\int_0^{f(x)}1dydx$.

Theorem 1. Supose random variable $X$ has density $g$, $U\sim\text{unif}[0, 1]$ is independent of $X$, and there exists $c > 0$ such that $f(x) \le c g(x)$ for all $x$. Then, the random vector $(X, cg(X)U)$ is uniformly distributed over the body $B_{cg}$ of $cg$.

Theorem 1 states that the AR sampling scheme uniformly samples from $B_{cg}=B_h$.
The sample is accepted if and only if $0 \le h(X)U = cg(X)U = Y \le f(X)$, i.e., $(X, Y) \in B_f$. The conditional density of $(X, Y)$ given ${(X, Y) \in B_f}$ is merely $\frac{|B_{cg}|^{-1}\mathbb{I}_{B_{cg}}(x,y)}{P(B_f)}\mathbb{I}_{B_f}(x,y) = \frac{|B_{cg}|^{-1}\mathbb{I}_{B_{cg}}(x,y)}{|B_f|/|B_{cg}|}\mathbb{I}_{B_f}(x,y) = \frac{1}{|B_f|}\mathbb{I}_{B_f}(x,y) .$ That is, $(X,Y)|{(X, Y)\in B_f} \sim \text{unif}(B_f)$.
- This means the accepted sample $(X, Y)$ is drawn according to $\text{unif}(B_f)$.
Now the marginal density of $X$ is $f$. To see this, note when $(X, Y) \sim \text{unif}(B_f)$, \begin{align} P(X \in A) &= \frac{|B_{\bar{A}}|}{|B_f|}, \quad \bar{A} = B_f \cap {(x, y): x\in A} = {(x, y): x \in A, ~ 0 \le y \le f(x)}.
&= \frac{\int_A f(x)dx}{1}, \quad \because |B_f| = \int_{-\infty}^{\infty}f(x)dx = 1
&= \int_A f(x)dx. \end{align}
Thus sample $X$ is drawn according to $f$!
The total acceptance ratio is \begin{align} P\left(U \le \frac{f(X)}{cg(X)}\right) &= \int_{-\infty}^{\infty}\int_{0}^{f(x)/[cg(x)]} 1\cdot g(x) du dx
&= \int_{-\infty}^{\infty} g(x) \int_{0}^{f(x)/[cg(x)]} 1 du dx
&= \int_{-\infty}^{\infty} g(x) \frac{f(x)}{cg(x)} dx
&= \int_{-\infty}^{\infty} \frac{f(x)}{c}dx
&= \frac{1}{c} \int_{-\infty}^{\infty} f(x) dx
&= \frac{1}{c} . \end{align}

Proof of Theorem 1

We want to show that the joint density of $(X, cg(X)U)$ is $\frac{1}{|B_{cg}|}\mathbb{I}_{B_{cg}}(x, y).$

Let $Y = cg(X)U$. Then $Y|{X=x} = cg(x) U \sim \text{unif}[0, cg(x)]$. That is, the conditional density of $Y$ given $X$ is $p_{Y|X}(y|x) = \frac{1}{cg(x)}\mathbb{I}_{[0, cg(x)]}(y).$ By construction, the marginal density of $X$ is given by $p_X(x) = g(x)$. Therefore the joint density of $(X, Y)$ is \begin{align} p_{XY}(x, y) &= p_X(x)p_{Y|X}(y|x) = \frac{1}{c}\mathbb{I}_{[0, cg(x)]}(y)
&= \begin{cases} 1/c, & \text{if } 0 \le y \le c g(x),
0, & \text{otherwise}. \end{cases}
&= \frac{1}{c}\mathbb{I}_{B_{cg}}(x,y). \end{align} Now since $1 = \int_{-\infty}^{\infty}\int_0^{cg(x)}\frac{1}{c}dydx = \frac{1}{c}|B_{cg}|,$ we have $\frac{1}{|B_{cg}|}\mathbb{I}{B{cg}}(x, y)$ as desired.

Example: Marsaglia

One way to sample from the unit disk is to sample $(U, V)$ from $\text{unif}[-1, 1]\times\text{unif}[-1, 1]$, and discard the sample if $U^2 + V^2 > 1$ and resample.

We have $X=(U, V)$.

Target density: $f(u, v) = \frac{1}{\pi}\mathbb{I}_{{u^2+v^2<1}}(u, v)$ (uniform from unit disc)
Sampling density: $g(u, v) = \frac{1}{4}\mathbb{I}_{{ u <1, v <1}}(u, v)$ (uniform from $[-1, 1]^2$)
Envelope: $h(u, v) = \frac{1}{\pi}\mathbb{I}_{ u <1, v <1}(u, v) = \frac{4}{\pi}g(u, v)$, hence $c=4/\pi$.
Accptance criterion $\frac{f(U,V)}{h(U,V)} = \frac{\mathbb{I}_{\{U^2+V^2 < 1\}}}{\mathbb{I}_{\{|U| < 1, |V| < 1\}}} = \begin{cases} 1, & \text{if } U^2 + V^2 < 1 \\ 0, & \text{otherwise} \end{cases}$
Thus we accept the sample from $g$ iff $U^2 + V^2 < 1$.

Example: gamma random numbers

Recall that the Gamma distribution with shape parameter $\alpha$ and scale parameter $\beta$ has density $f_{\Gamma}(x; \alpha, \beta) = \frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha-1}e^{-x/\beta}$ for $x \ge 0$. If $X \sim \text{Gamma}(\alpha, \beta)$, then $cX \sim \text{Gamma}(\alpha, c\beta)$. Hence it suffices to sample from $\text{Gamma}(\alpha, 1)$. Furthermore, $X\sim \text{Gamma}(\alpha, 1)$ and $\alpha > 1$, then $X \stackrel{d}{=} Y + Z$ where $Y \sim \text{Gamma}(\lfloor \alpha \rfloor, 1)$, $Z \sim \text{Gamma}(\alpha - \lfloor \alpha \rfloor, 1)$ and independent of $Y$. The $Y$ can be generated by summing $\lfloor \alpha \rfloor$ independent $\text{Exp}(1)$ random variables. Therefore we only need to sample from $\text{Gamma}(\alpha, 1)$, $\alpha \in (0, 1)$.

If $0 < \alpha < 1$, we see that $x^{\alpha - 1}e^{-x} \le \begin{cases} x^{\alpha - 1}, & \text{if } 0 \le x \le 1, \\ e^{-x}, & \text{otherwise}. \end{cases}$ Thus we choose $h(x) = \begin{cases} x^{\alpha - 1}/\Gamma(\alpha), & \text{if } 0 \le x \le 1, \\ e^{-x}/\Gamma(\alpha), & \text{otherwise}. \end{cases}$ leading to $g(x) = \begin{cases} x^{\alpha - 1}/(1/\alpha + 1/e), & \text{if } 0 \le x \le 1, \\ e^{-x}/(1/\alpha + 1/e), & \text{otherwise}. \end{cases}$ and $c = \frac{1}{\Gamma(\alpha)}\left(\frac{1}{\alpha} + \frac{1}{e}\right).$ Density $g$ has cdf $G(x) = \begin{cases} x^{\alpha}/(1 + \alpha/e), & \text{if } 0 \le x \le 1, \\ \frac{1 + \alpha/e - \alpha e^{-x}}{1 + \alpha/e}, & \text{otherwise}. \end{cases}$ whose inverse is $G^{-1}(u) = \begin{cases} [(1 + \alpha/e)u]^{1/\alpha}, & \text{if } 0 \le u \le 1/[1+\alpha/e], \\ -\log(1/\alpha + 1/e) - \log(1 - u), & 1/[1+\alpha/e] \le u < 1. \end{cases}$

gengamma_ar <- function(nsamp, alph) {
    # sample X from g
    v <- runif(nsamp)  # unif rv for inverse method
    idx <- v > 1 / (1 + alph * exp(-1))
    x <- numeric(nsamp)
    x[idx]  = -log(1 / alph + exp(-1)) - log(1 - v[idx])
    x[!idx] = ((1 + alph * exp(-1)) * v[!idx])^(1 / alph)
    
    # test acceptance
    u <- runif(nsamp)
    idx2 <- (x > 1)
    accept <- logical(nsamp)
    accept[idx2]  <- (u[idx2] < x[idx2]^(alph - 1))
    accept[!idx2] <- (u[!idx2] < exp(-x[!idx2]))
    
    x[accept]
}

n <- 2000
alph <- 0.5
x <- gengamma_ar(n, alph) 
length(x)
length(x) / n

1501

0.7505

gamma(0.5) / (1 / alph + exp(-1) )   # acceptance ratio

0.748540580270693

gamdata <- data.frame(x = x)
plt <- ggplot(gamdata, aes(x=x)) +  
    geom_histogram(binwidth=0.25, fill="white", color="black") + 
    geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)

Multivariate random numbers

Batch methods

Multinomial

Suppose we want to sample $(X_1, \dotsc, X_k) \sim \text{mult}(n, p_1, \dotsc, p_k)$, where $\sum_{i=1}^k p_i = 1$, $\sum_{i=1}^n X_i = n$.

Method 1: draw $n$ independent realization from pmf $(p_1, \dotsc, p_k)$.
- An easy way is to declare category $j$, $j \in {1, \dotsc, k}$ if $U \in \left[\sum_{k=0}^{j-1} p_k, \sum_{k=0}^{j} p_k \right),$ where $U \sim \text{unif}[0, 1]$. Take $p_0 = 0$.
Method 2: sample $k$ independent Poisson random variables $(X_1, \dotsc, X_k)$ with means $\lambda p_1, \dotsc, \lambda p_k$. If the total number of successes $\sum_{i=1}^k X_i$ is equal to $n$, then the conditional distribution of $(X_1, \dotsc, X_k)$ is the desired multinomial.
- We must be fortunate for the sum to be exactly $n$.

Multivariate normal

Suppose we want to sample $\mathbf{X} = (X_1, \dotsc, X_p)$ from MVN $N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$. If we can find $\mathbf{L} \in \mathbb{R}^{p \times p}$ such that $\mathbf{L}^T\mathbf{L} = \boldsymbol{\Sigma}$, then $\mathbf{L}^T\mathbf{Z} + \boldsymbol{\mu}, \quad \mathbf{Z} \sim N(0, \mathbf{I}_p)$ follows the desired distribution.

Random vector $\mathbf{Z}$ consists of $p$ independent standard normal random numbers.
Possible choices of $\mathbf{L}$ are:
- Cholesky decomposition of $\boldsymbol{\Sigma}$: $\mathbf{L}$ is lower triangular.
- Matrix square root: if $\mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T$ is a eigenvalue decomposition of $\boldsymbol{\Sigma}$, where $\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \dotsc, \lambda_p)$ with $\lambda_i \ge 0$, then $\mathbf{L} = \mathbf{Q}\boldsymbol{\Lambda}^{1/2}\mathbf{Q}^T, \quad \boldsymbol{\Lambda}^{1/2} = \text{diag}(\lambda_1^{1/2}, \dotsc, \lambda_p^{1/2}) .$ such a matrix is symmetric and positive semidefinite, and is often denoted by $\boldsymbol{\Sigma}^{1/2}$.
For large $p$, finding such a decomposition is challenging.

Multivariate $t$

A multivariate $t$ distribution with degrees of freedom $\nu$, scale matrix $\boldsymbol{\Sigma}$, and location vector $\boldsymbol{\mu}$ is the distribution of the random vector $\mathbf{T} = \frac{1}{\sqrt{\chi^2_{\nu}/\nu}}\mathbf{X} + \boldsymbol{\mu},$ where $\mathbf{X} \sim N(0, \boldsymbol{\Sigma})$, and $\chi^2_{\nu}$ is the chi-square random variable with $\nu$ degrees of freedom, independent of $\mathbf{X}$.

Sequential sampling

In many cases we can sample a random vector by sampling each component in turn and conditioning: \begin{align} p_{X_1, \dotsc, X_p}(x_1, \dotsc, x_p) &= p_{X_1}(x_1)\prod_{j=2}^p p_{X_j|X_1, \dotsc, X_{j-1}}(x_j | x_1, \dotsc, x_{j-1})
&= p_{X_1}(x_1)p_{X_2|X_1}(x_2|x_1) \prod_{j=3}^p p_{X_j|X_2, \dotsc, X_{j-1}}(x_j | x_1, \dotsc, x_{j-1})
&= \dotsb \end{align}

Multinomial

For $(X_1, \dotsc, X_k) \sim \text{mult}(n, p_1, \dotsc, p_k)$, it is immediate to see that $X_1 \sim B(n, p_1)$. Given $X_1 = x_1$, $(X_2, \dotsc, X_k) \sim \text{mult}(n - x_1, p_2 / (1 - p_1), \dotsc, p_k / (1 - p_1) )$. Hence $X_2

{X_1 = x_1} \sim B(n - x_1, p_2 / (1 - p_1) )$ and so on.

Multivariate normal

If we want to sample $\mathbf{X} = (X_1, \dotsc, X_p)$ from MVN with mean $\boldsymbol{\mu}=(\mu_1, \dotsc, \mu_p)^T$ and covariance matrix $\boldsymbol{\Sigma} = (\sigma_{ij})$, then note that the first component $X_1 \sim N(\mu_1, \sigma_{11})$. From the conditional distribution formula for multivariate normal, we see that $(X_2, \dotsc, X_p) | \{X_1 = x_1\} \sim N(\bar{\boldsymbol{\mu}}, \bar{\boldsymbol{\Sigma}}), \quad \bar{\boldsymbol{\mu}} = \boldsymbol{\mu}_2 + \boldsymbol{\Sigma}_{12}^T(x_1 - \mu_1)/\sigma_{11}, ~ \bar{\boldsymbol{\Sigma}} = \boldsymbol{\Sigma}_{22} - \boldsymbol{\Sigma}_{12}^T\boldsymbol{\Sigma}_{12}/\sigma_{11}$ if we partition \begin{align} \boldsymbol{\mu} &= (\mu_1, \boldsymbol{\mu}_2)^T
\boldsymbol{\Sigma} &= \begin{bmatrix} \sigma_{11} & \boldsymbol{\Sigma}_{12}
\boldsymbol{\Sigma}_{12}^T & \boldsymbol{\Sigma}_{22} \end{bmatrix}
. \end{align}

[통계계산] 9. Random Number Generation

[통계계산] 9. Numerical Integration

목차

Random number generation

Uniform random number generation

Goal

Congruential generator

Autocorrelation

R’s RNG

Transformation methods

Inverse CDF method

Exponential distribution

Cauchy distribution

Discrete uniform distribution

Geometric distribution

Normal random numbers

Box-Muller

Marsaglia

Random numbers by definition

Bernoulli

Binomial

Negative binomial

Poisson

Chi-square

Student’s $t$

$F$

Acceptance-rejection sampling (or just rejection sampling)

Why AR sampling works?

Proof of Theorem 1

Example: Marsaglia

Example: gamma random numbers

Multivariate random numbers

Batch methods

Multinomial

Multivariate normal

Multivariate $t$

Sequential sampling

Multinomial

Multivariate normal

[통계계산] 8. Numerical Integration

Share

Comments