June 13, 2021 11 min to read

[통계계산] 6. Non-linear Equations

Computational Statistics

[통계계산] 6. Non-linear Equations

목차

Computer Arithmetic

LU decomposition

Cholesky Decomposition

QR Decomposition

Interative Methods

Non-linear Equations

Optimization

Numerical Integration

Random Number Generation

Monte Carlo Integration

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] fansi_0.5.0       digest_0.6.27     utf8_1.2.1        crayon_1.4.1     
 [5] IRdisplay_1.0     repr_1.1.3        lifecycle_1.0.0   jsonlite_1.7.2   
 [9] evaluate_0.14     pillar_1.6.1      rlang_0.4.11      uuid_0.1-4       
[13] vctrs_0.3.8       ellipsis_0.3.2    IRkernel_1.1      tools_3.6.3      
[17] compiler_3.6.3    base64enc_0.1-3   pbdZMQ_0.3-5      htmltools_0.5.1.1

Nonlinear equation solve

Problem

Find $\mathbf{x}$ such that $f(\mathbf{x}) = 0.$
- If $f(\mathbf{x}) = \mathbf{A}\mathbf{x} - \mathbf{b}$, then we now know quite well how to solve this.
- However, in case $f$ is nonlinear in $\mathbf{x}$, closed-form solutions are out of reach – no “direct” method
- Iterative, numerical methods come to rescue.
Examples:
1. Find a root of $f(x) = x^4 - \sin x$.
2. Find a maximizer of $g(x) = \frac{\log x}{1 + x}$, by solving $f(x) = g'(x) = \frac{1 + x^{-1} - \log x}{(1 + x)^2}.$
Nonlinear equation and optimization are closed related (at least if $g$ is differentiable).
- No distinction between maximization and minimization (“critical points”)
- Statitical applications: MLE, minimizing Bayes risk, nonlinear least squares, MAP estimation, …

Univariate problems

\[f: \mathbb{R} \to \mathbb{R}\]

Bisection

Assume $f$ is continuous on $[l, r]$ and $f(l)f(r) \le 0$.
Then by the Intermediate Value Theorem, there exists $x^{\star}$ such that $f(x^{\star})=0$.
Consider the midpoint $m = (l + r)/2$.
If $f(m) = 0$, we are done.
Otherwise, start over with $r \gets m$ if $f(l)f(m) < 0$, $l \gets m$ if $f(m)f(r) < 0$.
Algorithm

bisect <- function(f, l, r, tol=1e-6) {
    iter <- 0
    while ( r - l > tol ) {
        m <- (l + r) / 2
        if ( f(l) * f(m) < 0 ) {
            r <- m
        } else {
            l <- m
        }
        iter <- iter + 1
    }
    ret <- list(val = (l + r) / 2, iter = iter)
}

Example: quantile of continuous distribution

F <- function(x) { pt(x, 5) - 0.95 }   # Student's t distribution with 5 df, 95%ile
F(1.291)
F(2.582)
ret <- bisect(F, 1.291, 2.582)
ret$val
ret$iter

-0.0765841085370957

0.0253437879217759

2.01504860901833

qt(0.95, 5)

2.01504837333302

After $n$ iterations, the final interval has length $2^{-n}(r^{(0)} - l^{(0)})$.
If $n$ is large enough, the midpoint of the final interval is an approximate root.

Fixed-point iteration

$x \in \mathbb{R}$ is a fixed point of function $F: \mathbb{R} \to \mathbb{R}$ if $F(x) = x$.
Fixed-point iteration $x^{(k+1)} = F(x^k)$
- If $F: \mathbb{R}^n \to \mathbb{R}^n$ maps $\mathbf{x} \mapsto \mathbf{R}\mathbf{x} - \mathbf{c}$ for matrix $\mathbf{R}=\mathbf{M}^{-1}\mathbf{K}$ where $\mathbf{M} - \mathbf{K} = \mathbf{A}$ and $\mathbf{c}=\mathbf{M}^{-1}\mathbf{b}$, then FPI reduces to the iterative method for solving linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$.
Fixed-point strategy for finding a root of $f$:
- Determine $F$ such that $f(x) = 0 \iff F(x) = x$;
- Apply FPI.

Example

Find a root of $f(x) = x^4 - \sin x$ on $(0, 1]$.

Method 1: choose $F_1(x) = (\sin x)^{1/4}$, since $f(x)=0 \iff x = (\sin x)^{1/4}$.

x <- 1.0
for (i in 1:10) {
    x <- (sin(x))^0.25
    print(x)
}

[1] 0.9577668
[1] 0.9509906
[1] 0.9498498
[1] 0.9496563
[1] 0.9496234
[1] 0.9496178
[1] 0.9496169
[1] 0.9496167
[1] 0.9496167
[1] 0.9496167

Method 2: choose $F_2(x) = \frac{\sin x}{x^3}$ since $f(x)=0 \iff x = \frac{\sin x}{x^3}$.

x <- 1.0
for (i in 1:10) {
    x <- sin(x) / x^3
    print(x)
}

[1] 0.841471
[1] 1.251418
[1] 0.4844576
[1] 4.096051
[1] -0.01187393
[1] 7092.524
[1] -2.604367e-12
[1] 1.474333e+23
[1] -2.289953e-70
[1] 1.906983e+139

Contraction mapping theorem

Theorem. Suppose the function $F$ defined on a closed interval $I \subset \mathbb{R}$ satisfies

$F(x) \in I$ whenever $x \in I$
$ F(x) - F(y) \le L x - y $ for all $x, y \in I$ (Lipschitz continuity).

Then, if $L \in [0, 1)$, $F$ has a unique fixed point $x^{\star} \in I$. Furthermore, fixed-point iteration $x^{k+1} = F(x^{k})$ converges to $x^{\star}$ regardless of the initial point $x^{(0)}$.

Proof. From condition 2, \begin{align} |x^{(k+1)} - x^{(k)}| &= |F(x^{(k)} - F(x^{(k-1)})||
&\le L|x^{(k)} - x^{(k-1)}|
&\vdots
&\le L^k|x^{(1)} - x^{(0)}| \end{align} Using the triangular inequality, for $m > n$ we have

\begin{align} |x^{(n)} - x^{(m)}| &= |x^{(n)} - x^{(n+1)} + x^{(n+1)} - x^{(n+2)} + \dotsb + x^{(m-1)} - x^{(m)}|
&\le \sum_{k=n}^{m-1} |x^{(k)} - x^{(k+1)}|
&\le \sum_{k=n}^{m-1} L^k |x^{(1)} - x^{(0)})| \tag{}
&\le \sum_{k=n}^{\infty} L^k |x^{(1)} - x^{(0)})|
= \frac{L^n}{1 - L} |x^{(1)} - x^{(0)}|. \end{align*} Thus the sequence ${x^{(k)}}$ is Cauchy and convergent. Since the interval $I$ is closed, $x^{\star} = \lim_k x^{(k)} \in I$.

Now by (Lipschitz) continuity of $F$, and the definition $x^{k+1} = F(x^{(k)})$, we see that $x^{\star}$ is a fixed point of $F$.

If there exists another fixed point $y^{\star} \neq x^{\star}$, then $|x^{\star} - y^{\star}| = |F(x^{\star}) - F(y^{\star})| \le L|x^{\star} - y^{\star}|,$ which is a contradiction since $L \in (0, 1)$.

Remark. By sending $m \to \infty$ in inequality (*), we have the bound $|x^{(n)} - x^{\star}| \le \frac{L^n}{1 - L}|x^{(1)} - x^{(0)}|$ which provides the rate of convergence to $x^{\star}$. This geometric rate of convergence is called linear convergence.

Back to example

Recall our example $F_1(x) = (\sin x)^{1/4}$ and $F_2(x) = \frac{\sin x}{x^3}$.
Lipschitz modulus of a differentiable function is $L = \sup_{x\in I} F’(x) $ (why?).
$F_1$ maps $I=[0.2, 1]$ to a smaller interval. $F_1’(x) = \frac{\cos x}{4\sin^{3/4}(x)}$ is decreasing on $I$ and $1 > F_1’(0.2) > F_1’(1) > 0$.
$F_2$ expands $I=[0.9, 1]$ to $[0.841, 1.075]$ and also $F_2’(x) = \frac{x\cos x - 3\sin x}{x^4}$ is increasing on $I$ with $F_2’(1) < -1$.

Obvious choice of $F$

Probably the most obvious choice for $F$ to find a root of $f$ is to set $F(x) = f(x) + x$. This yields an iteration $x^{(k+1)} = x^{(k)} + f(x^{(k)}).$ (Homework)

Newton’s method (Newton-Raphson)

Newton’s method was originally developed by, not surprisingly, Isaac Newton, and further refined by Joseph Raphson, to find roots of nonlinear equations $f(x) = 0$: $x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{f'(x^{(k)})}.$
Motivation: first-order Taylor expansion
- If $x^{(k)}$ approximates the root $x^{\star}$ of $f$, then $0 = f(x^{\star}) = f(x^{(k)}) + f'(\tilde{x})(x^{\star} - x^{(k)})$ for some $\tilde{x}$ on the line segment connecting $x^{(k)}$ and $x^{\star}$.
- If we substitute $\tilde{x}$ with $x^{(k)}$ and $x^{\star}$ with the next approximation $x^{(k+1)}$, we get $0 = f(x^{(k)}) + f'(x^{(k)})(x^{(k+1)} - x^{(k)}),$ yielding the above iteration.
Fixed-point iteration: Newton’s method is a FPI with $F(x) = x - \frac{f(x)}{f'(x)}.$
Thus the (local) convergence of Newton’s method is governed by $F'(x) = 1 - \frac{(f'(x))^2 - f(x)f''(x)}{(f'(x))^2} = \frac{f(x)f''(x)}{(f'(x))^2}$ (assuming $f$ is twice differentiable).
If $f’(x^{\star})\neq 0$ and $ f’‘(x^\star) < \infty$, then $F’(x^{\star})=0$.
With this assumption, apply 2nd-order Taylor expansion of $F$ around $x^{\star}$: \begin{align} x^{(k)} - x^{\star} &= F(x^{(k-1)}) - F(x^{\star})
&= F’(x^{\star})(x^{(k-1)} - x^{\star}) + \frac{1}{2}F’’(\tilde{x})(x^{(k-1)} - x^{\star})^2
&= \frac{1}{2}F’’(\tilde{x})(x^{(k-1)} - x^{\star})^2, \end{align} for $\tilde{x}$ on the line segment connecting $x^{(k-1)}$ and $x^{\star}$.
If furthermore $F’’$ is continuous and $x^{(0)}$ is sufficiently close to $x^{\star}$, then $\lim_{k}\frac{|x^{(k)} - x^{\star}|}{|x^{(k-1)} - x^{\star}|^2} = \frac{1}{2}|F''(x^{\star})|.$ Thus Newton’s method has a locally quadratic convergence property (under suitable contidions).
The local convergence theory requires a good starting point $x^{(0)}$ (“sufficiently close to $x^{\star}$). With a good starting point, Newton-Rephson will converge very fast to the nearby solution. Otherwise, it can fail miserably.

Examples

Quantile (cf. bisection)

F <- function(x) { pt(x, 5) - 0.95 }   # Student's t distribution with 5 df, 95%ile
Fprime <- function(x) { dt(x, 5) }  # derivative of F (density function)
#x <- 1.291
x <- 2.582
options(digits=20)
for (i in 1:10) {
    x <- x - (F(x) / Fprime(x))
    print(x)
}

[1] 1.7338465618414033997
[1] 1.968962641494075072
[1] 2.0136622631340750367
[1] 2.0150470922847292243
[1] 2.0150483733319273227
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779

qt(0.95, 5)

2.01504837333302

The following animation is the result of applying Newton’s method for minimizing $g(x) = \sin x$, i.e., finding a root of $f(x) = \cos x$, starting from $x^{(0)} = 2.0, 2.75, 4.0$, respectively.

Multivariate problems

Goal: to solve \begin{align} f_1(x_1, \dotsc, x_n) &= 0
\vdots & ~
f_n(x_1, \dotsc, x_n) &=0. \end{align}

Example \begin{align} 2x_1 - x_2 &= e^{-x_1}
-x_1 + 2x_2 &= e^{-x_2} \end{align}

Newton’s method

Same idea as univariate version: linearize $f$ around $\mathbf{x}=(x_1, \dotsc, x_n)$.
Iteration: $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - [Jf(\mathbf{x}^{(k)})]^{-1}f(\mathbf{x}^{(k)})$ where $Jf(\mathbf{x}) = \begin{bmatrix} \frac{\partial f_1(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_1(\mathbf{x})}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_n(\mathbf{x})}{\partial x_n} \end{bmatrix}$
Remember we should solve a linear system $[Jf(\mathbf{x}^{(k)})]\Delta\mathbf{x} = -f(\mathbf{x}^{(k)})$ to get $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \Delta\mathbf{x}$, using LU etc, for each iteration.

Variants

Modified Newton: replace $Jf(\mathbf{x}^{(k)})$ with $J = Jf(\mathbf{x}^{(0)})$.
Jacobi’s method: replace $Jf(\mathbf{x}^{(k)})$ with $\text{diag}(Jf(\mathbf{x}^{(k)})$.
Gauss-Seidel: replace $Jf(\mathbf{x}^{(k)})$ with $\text{lowertri}(Jf(\mathbf{x}^{(k)})$.
SOR: replace $Jf(\mathbf{x}^{(k)})$ with $\frac{1}{\omega}\text{diag}(Jf(\mathbf{x}^{(k)}) + \text{strictlowertri}(Jf(\mathbf{x}^{(k)})$.

YeonghyeonKO v3.1.0

[통계계산] 6. Non-linear Equations

[통계계산] 6. Non-linear Equations

목차

Nonlinear equation solve

Problem

Univariate problems

Bisection

Fixed-point iteration

Example

Contraction mapping theorem

Back to example

Obvious choice of $F$

Newton’s method (Newton-Raphson)

Examples

Multivariate problems

Newton’s method

Variants

[표본설계] 4. 비추정과 회귀추정

Comments

[통계계산] 6. Non-linear Equations

[통계계산] 6. Non-linear Equations

목차

Nonlinear equation solve

Problem

Univariate problems

Bisection

Fixed-point iteration

Example

Contraction mapping theorem

Back to example

Obvious choice of $F$

Newton’s method (Newton-Raphson)

Examples

Multivariate problems

Newton’s method

Variants

[표본설계] 4. 비추정과 회귀추정

Share

Comments