11 min to read
[통계계산] 6. Non-linear Equations
Computational Statistics
[통계계산] 6. Non-linear Equations
목차
- Computer Arithmetic
- LU decomposition
- Cholesky Decomposition
- QR Decomposition
- Interative Methods
- Non-linear Equations
- Optimization
- Numerical Integration
- Random Number Generation
- Monte Carlo Integration
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] fansi_0.5.0 digest_0.6.27 utf8_1.2.1 crayon_1.4.1
[5] IRdisplay_1.0 repr_1.1.3 lifecycle_1.0.0 jsonlite_1.7.2
[9] evaluate_0.14 pillar_1.6.1 rlang_0.4.11 uuid_0.1-4
[13] vctrs_0.3.8 ellipsis_0.3.2 IRkernel_1.1 tools_3.6.3
[17] compiler_3.6.3 base64enc_0.1-3 pbdZMQ_0.3-5 htmltools_0.5.1.1
Nonlinear equation solve
Problem
- Find $\mathbf{x}$ such that
\(f(\mathbf{x}) = 0.\)
- If $f(\mathbf{x}) = \mathbf{A}\mathbf{x} - \mathbf{b}$, then we now know quite well how to solve this.
- However, in case $f$ is nonlinear in $\mathbf{x}$, closed-form solutions are out of reach – no “direct” method
- Iterative, numerical methods come to rescue.
- Examples:
- Find a root of $f(x) = x^4 - \sin x$.
- Find a maximizer of $g(x) = \frac{\log x}{1 + x}$, by solving \(f(x) = g'(x) = \frac{1 + x^{-1} - \log x}{(1 + x)^2}.\)
- Nonlinear equation and optimization are closed related (at least if $g$ is differentiable).
- No distinction between maximization and minimization (“critical points”)
- Statitical applications: MLE, minimizing Bayes risk, nonlinear least squares, MAP estimation, …
Univariate problems
\[f: \mathbb{R} \to \mathbb{R}\]Bisection
- Assume $f$ is continuous on $[l, r]$ and $f(l)f(r) \le 0$.
- Then by the Intermediate Value Theorem, there exists $x^{\star}$ such that $f(x^{\star})=0$.
- Consider the midpoint $m = (l + r)/2$.
- If $f(m) = 0$, we are done.
-
Otherwise, start over with $r \gets m$ if $f(l)f(m) < 0$, $l \gets m$ if $f(m)f(r) < 0$.
- Algorithm
bisect <- function(f, l, r, tol=1e-6) {
iter <- 0
while ( r - l > tol ) {
m <- (l + r) / 2
if ( f(l) * f(m) < 0 ) {
r <- m
} else {
l <- m
}
iter <- iter + 1
}
ret <- list(val = (l + r) / 2, iter = iter)
}
- Example: quantile of continuous distribution
F <- function(x) { pt(x, 5) - 0.95 } # Student's t distribution with 5 df, 95%ile
F(1.291)
F(2.582)
ret <- bisect(F, 1.291, 2.582)
ret$val
ret$iter
-0.0765841085370957
0.0253437879217759
2.01504860901833
21
qt(0.95, 5)
2.01504837333302
- After $n$ iterations, the final interval has length $2^{-n}(r^{(0)} - l^{(0)})$.
- If $n$ is large enough, the midpoint of the final interval is an approximate root.
Fixed-point iteration
-
$x \in \mathbb{R}$ is a fixed point of function $F: \mathbb{R} \to \mathbb{R}$ if $F(x) = x$.
- Fixed-point iteration
\(x^{(k+1)} = F(x^k)\)
- If $F: \mathbb{R}^n \to \mathbb{R}^n$ maps $\mathbf{x} \mapsto \mathbf{R}\mathbf{x} - \mathbf{c}$ for matrix $\mathbf{R}=\mathbf{M}^{-1}\mathbf{K}$ where $\mathbf{M} - \mathbf{K} = \mathbf{A}$ and $\mathbf{c}=\mathbf{M}^{-1}\mathbf{b}$, then FPI reduces to the iterative method for solving linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$.
- Fixed-point strategy for finding a root of $f$:
- Determine $F$ such that $f(x) = 0 \iff F(x) = x$;
- Apply FPI.
Example
Find a root of $f(x) = x^4 - \sin x$ on $(0, 1]$.
- Method 1: choose $F_1(x) = (\sin x)^{1/4}$, since $f(x)=0 \iff x = (\sin x)^{1/4}$.
x <- 1.0
for (i in 1:10) {
x <- (sin(x))^0.25
print(x)
}
[1] 0.9577668
[1] 0.9509906
[1] 0.9498498
[1] 0.9496563
[1] 0.9496234
[1] 0.9496178
[1] 0.9496169
[1] 0.9496167
[1] 0.9496167
[1] 0.9496167
- Method 2: choose $F_2(x) = \frac{\sin x}{x^3}$ since $f(x)=0 \iff x = \frac{\sin x}{x^3}$.
x <- 1.0
for (i in 1:10) {
x <- sin(x) / x^3
print(x)
}
[1] 0.841471
[1] 1.251418
[1] 0.4844576
[1] 4.096051
[1] -0.01187393
[1] 7092.524
[1] -2.604367e-12
[1] 1.474333e+23
[1] -2.289953e-70
[1] 1.906983e+139
Contraction mapping theorem
Theorem. Suppose the function $F$ defined on a closed interval $I \subset \mathbb{R}$ satisfies
- $F(x) \in I$ whenever $x \in I$
-
$ F(x) - F(y) \le L x - y $ for all $x, y \in I$ (Lipschitz continuity).
Then, if $L \in [0, 1)$, $F$ has a unique fixed point $x^{\star} \in I$. Furthermore, fixed-point iteration $x^{k+1} = F(x^{k})$ converges to $x^{\star}$ regardless of the initial point $x^{(0)}$.
Proof. From condition 2,
\begin{align}
|x^{(k+1)} - x^{(k)}| &= |F(x^{(k)} - F(x^{(k-1)})||
&\le L|x^{(k)} - x^{(k-1)}|
&\vdots
&\le L^k|x^{(1)} - x^{(0)}|
\end{align}
Using the triangular inequality, for $m > n$ we have
\begin{align}
|x^{(n)} - x^{(m)}| &= |x^{(n)} - x^{(n+1)} + x^{(n+1)} - x^{(n+2)} + \dotsb + x^{(m-1)} - x^{(m)}|
&\le \sum_{k=n}^{m-1} |x^{(k)} - x^{(k+1)}|
&\le \sum_{k=n}^{m-1} L^k |x^{(1)} - x^{(0)})| \tag{}
&\le \sum_{k=n}^{\infty} L^k |x^{(1)} - x^{(0)})|
= \frac{L^n}{1 - L} |x^{(1)} - x^{(0)}|.
\end{align*}
Thus the sequence ${x^{(k)}}$ is Cauchy and convergent. Since the interval $I$ is closed, $x^{\star} = \lim_k x^{(k)} \in I$.
Now by (Lipschitz) continuity of $F$, and the definition $x^{k+1} = F(x^{(k)})$, we see that $x^{\star}$ is a fixed point of $F$.
If there exists another fixed point $y^{\star} \neq x^{\star}$, then \(|x^{\star} - y^{\star}| = |F(x^{\star}) - F(y^{\star})| \le L|x^{\star} - y^{\star}|,\) which is a contradiction since $L \in (0, 1)$.
Remark. By sending $m \to \infty$ in inequality (*), we have the bound \(|x^{(n)} - x^{\star}| \le \frac{L^n}{1 - L}|x^{(1)} - x^{(0)}|\) which provides the rate of convergence to $x^{\star}$. This geometric rate of convergence is called linear convergence.
Back to example
-
Recall our example $F_1(x) = (\sin x)^{1/4}$ and $F_2(x) = \frac{\sin x}{x^3}$.
-
Lipschitz modulus of a differentiable function is $L = \sup_{x\in I} F’(x) $ (why?). -
$F_1$ maps $I=[0.2, 1]$ to a smaller interval. $F_1’(x) = \frac{\cos x}{4\sin^{3/4}(x)}$ is decreasing on $I$ and $1 > F_1’(0.2) > F_1’(1) > 0$.
- $F_2$ expands $I=[0.9, 1]$ to $[0.841, 1.075]$ and also $F_2’(x) = \frac{x\cos x - 3\sin x}{x^4}$ is increasing on $I$ with $F_2’(1) < -1$.
Obvious choice of $F$
- Probably the most obvious choice for $F$ to find a root of $f$ is to set $F(x) = f(x) + x$. This yields an iteration \(x^{(k+1)} = x^{(k)} + f(x^{(k)}).\) (Homework)
Newton’s method (Newton-Raphson)
-
Newton’s method was originally developed by, not surprisingly, Isaac Newton, and further refined by Joseph Raphson, to find roots of nonlinear equations $f(x) = 0$: \(x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{f'(x^{(k)})}.\)
- Motivation: first-order Taylor expansion
- If $x^{(k)}$ approximates the root $x^{\star}$ of $f$, then \(0 = f(x^{\star}) = f(x^{(k)}) + f'(\tilde{x})(x^{\star} - x^{(k)})\) for some $\tilde{x}$ on the line segment connecting $x^{(k)}$ and $x^{\star}$.
- If we substitute $\tilde{x}$ with $x^{(k)}$ and $x^{\star}$ with the next approximation $x^{(k+1)}$, we get \(0 = f(x^{(k)}) + f'(x^{(k)})(x^{(k+1)} - x^{(k)}),\) yielding the above iteration.
-
Fixed-point iteration: Newton’s method is a FPI with \(F(x) = x - \frac{f(x)}{f'(x)}.\)
-
Thus the (local) convergence of Newton’s method is governed by \(F'(x) = 1 - \frac{(f'(x))^2 - f(x)f''(x)}{(f'(x))^2} = \frac{f(x)f''(x)}{(f'(x))^2}\) (assuming $f$ is twice differentiable).
-
If $f’(x^{\star})\neq 0$ and $ f’‘(x^\star) < \infty$, then $F’(x^{\star})=0$. -
With this assumption, apply 2nd-order Taylor expansion of $F$ around $x^{\star}$: \begin{align} x^{(k)} - x^{\star} &= F(x^{(k-1)}) - F(x^{\star})
&= F’(x^{\star})(x^{(k-1)} - x^{\star}) + \frac{1}{2}F’’(\tilde{x})(x^{(k-1)} - x^{\star})^2
&= \frac{1}{2}F’’(\tilde{x})(x^{(k-1)} - x^{\star})^2, \end{align} for $\tilde{x}$ on the line segment connecting $x^{(k-1)}$ and $x^{\star}$. -
If furthermore $F’’$ is continuous and $x^{(0)}$ is sufficiently close to $x^{\star}$, then \(\lim_{k}\frac{|x^{(k)} - x^{\star}|}{|x^{(k-1)} - x^{\star}|^2} = \frac{1}{2}|F''(x^{\star})|.\) Thus Newton’s method has a locally quadratic convergence property (under suitable contidions).
- The local convergence theory requires a good starting point $x^{(0)}$ (“sufficiently close to $x^{\star}$). With a good starting point, Newton-Rephson will converge very fast to the nearby solution. Otherwise, it can fail miserably.
Examples
- Quantile (cf. bisection)
F <- function(x) { pt(x, 5) - 0.95 } # Student's t distribution with 5 df, 95%ile
Fprime <- function(x) { dt(x, 5) } # derivative of F (density function)
#x <- 1.291
x <- 2.582
options(digits=20)
for (i in 1:10) {
x <- x - (F(x) / Fprime(x))
print(x)
}
[1] 1.7338465618414033997
[1] 1.968962641494075072
[1] 2.0136622631340750367
[1] 2.0150470922847292243
[1] 2.0150483733319273227
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779
[1] 2.015048373333023779
qt(0.95, 5)
2.01504837333302
- The following animation is the result of applying Newton’s method for minimizing $g(x) = \sin x$, i.e., finding a root of $f(x) = \cos x$, starting from $x^{(0)} = 2.0, 2.75, 4.0$, respectively.
Multivariate problems
Goal: to solve
\begin{align}
f_1(x_1, \dotsc, x_n) &= 0
\vdots & ~
f_n(x_1, \dotsc, x_n) &=0.
\end{align}
- Example
\begin{align}
2x_1 - x_2 &= e^{-x_1}
-x_1 + 2x_2 &= e^{-x_2} \end{align}
Newton’s method
-
Same idea as univariate version: linearize $f$ around $\mathbf{x}=(x_1, \dotsc, x_n)$.
-
Iteration: \(\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - [Jf(\mathbf{x}^{(k)})]^{-1}f(\mathbf{x}^{(k)})\) where \(Jf(\mathbf{x}) = \begin{bmatrix} \frac{\partial f_1(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_1(\mathbf{x})}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_n(\mathbf{x})}{\partial x_n} \end{bmatrix}\)
-
Remember we should solve a linear system \([Jf(\mathbf{x}^{(k)})]\Delta\mathbf{x} = -f(\mathbf{x}^{(k)})\) to get $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \Delta\mathbf{x}$, using LU etc, for each iteration.
Variants
-
Modified Newton: replace $Jf(\mathbf{x}^{(k)})$ with $J = Jf(\mathbf{x}^{(0)})$.
-
Jacobi’s method: replace $Jf(\mathbf{x}^{(k)})$ with $\text{diag}(Jf(\mathbf{x}^{(k)})$.
-
Gauss-Seidel: replace $Jf(\mathbf{x}^{(k)})$ with $\text{lowertri}(Jf(\mathbf{x}^{(k)})$.
-
SOR: replace $Jf(\mathbf{x}^{(k)})$ with $\frac{1}{\omega}\text{diag}(Jf(\mathbf{x}^{(k)}) + \text{strictlowertri}(Jf(\mathbf{x}^{(k)})$.
Comments