[통계계산] 2. LU Decomposition

Computational Statistics

Featured image

[통계계산] 2. LU Decomposition


목차

  1. Computer Arithmetic
  2. LU decomposition
  3. Cholesky Decomposition
  4. QR Decomposition
  5. Interative Methods
  6. Non-linear Equations
  7. Optimization
  8. Numerical Integration
  9. Random Number Generation
  10. Monte Carlo Integration


sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] fansi_0.5.0       digest_0.6.27     utf8_1.2.1        crayon_1.4.1     
 [5] IRdisplay_1.0     repr_1.1.3        lifecycle_1.0.0   jsonlite_1.7.2   
 [9] evaluate_0.14     pillar_1.6.1      rlang_0.4.11      uuid_0.1-4       
[13] vctrs_0.3.8       ellipsis_0.3.2    IRkernel_1.1      tools_3.6.3      
[17] compiler_3.6.3    base64enc_0.1-3   pbdZMQ_0.3-5      htmltools_0.5.1.1

Required R packages

library(Matrix)

Computer methods for solving linear systems of equations

We consider computer algorithms for solving linear equations $\mathbf{A} \mathbf{x} = \mathbf{b}$, a ubiquitous task in statistics.

Our mantra

The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.
– James Gentle, Matrix Algebra, Springer, New York (2007).

Algorithms

Measure of algorithm efficiency

Triangular systems

Idea: turning original problem into an easy one, e.g., triangular system.

Lower triangular system

To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is lower triangular

\[\begin{pmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}.\]

Upper triangular system

To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular
\(\begin{pmatrix} a_{11} & \cdots & a_{1,n-1} & a_{1n} \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\ 0 & 0 & 0 & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ \vdots \\ x_{n-1} \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ \vdots \\ b_{n-1} \\ b_n \end{pmatrix}.\)

Implementation

set.seed(123) # seed
n <- 5
A <- Matrix(rnorm(n * n), nrow = n)
b <- rnorm(n)
# a random matrix
A
5 x 5 Matrix of class "dgeMatrix"
            [,1]       [,2]       [,3]       [,4]       [,5]
[1,] -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237
[2,] -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749
[3,]  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044
[4,]  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912
[5,]  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
Al <- lower.tri(A) # Returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower triangle
Aupper <- A
Aupper[Al] <- 0
Aupper
5 x 5 Matrix of class "dgeMatrix"
           [,1]      [,2]      [,3]       [,4]       [,5]
[1,] -0.5604756 1.7150650 1.2240818  1.7869131 -1.0678237
[2,]  0.0000000 0.4609162 0.3598138  0.4978505 -0.2179749
[3,]  0.0000000 0.0000000 0.4007715 -1.9666172 -1.0260044
[4,]  0.0000000 0.0000000 0.0000000  0.7013559 -0.7288912
[5,]  0.0000000 0.0000000 0.0000000  0.0000000 -0.6250393
backsolve(Aupper, b) 

<ol class=list-inline><li>14.6233350267841</li><li>22.7861639334402</li><li>-22.9457487682144</li><li>-3.70749941033203</li><li>-2.00597784101513</li></ol>

Some algebraic facts of triangular system

Gaussian Elimination and LU Decomposition

A <- t(Matrix(c(2.0, -4.0, 2.0, 4.0, -9.0, 7.0, 2.0, 1.0, 3.0), ncol=3))  # R used column major ordering
A
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    4   -9    7
[3,]    2    1    3
b <- c(6.0, 20.0, 14.0)
# R's way tp solve linear equation

solve(A, b)
3 x 1 Matrix of class "dgeMatrix"
     [,1]
[1,]    2
[2,]    1
[3,]    3

What happens when we call solve(A, b)?

Elementary operator matrix

\[\mathbf{E}_{jk}(c) \times \mathbf{X} = \mathbf{E}_{jk}(c) \times \begin{pmatrix} & & \\ \cdots & \mathbf{x}_{k\cdot} & \cdots \\ & & \\ \cdots & \mathbf{x}_{j\cdot} & \cdots \\ & & \end{pmatrix} = \begin{pmatrix} & & \\ \cdots & \mathbf{x}_{k\cdot} & \cdots \\ & & \\ \cdots & c \mathbf{x}_{k\cdot} + \mathbf{x}_{j\cdot} & \cdots \\ & & \end{pmatrix}.\]

$2m$ flops (why?).

E21 <- t(Matrix(c(1.0, 0.0, 0.0, -2.0, 1.0, 0.0, 0.0, 0.0, 1.0), nrow=3))  # Step 1, inv(L1)
E21
3 x 3 sparse Matrix of class "dtCMatrix"
           
[1,]  1 . .
[2,] -2 1 .
[3,]  . . 1
# zero (2, 1) entry   
E21 %*% A   # Step 1, A'
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    0   -1    3
[3,]    2    1    3
E31 <- t(Matrix(c(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, -1.0, 0.0, 1.0), nrow=3))  # Step 2, find the corresponding matrix!
E31
3 x 3 sparse Matrix of class "dtCMatrix"
           
[1,]  1 . .
[2,]  . 1 .
[3,] -1 . 1
# zero (3, 1) entry    
E31 %*% E21 %*% A  # Step2, A''
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    0   -1    3
[3,]    0    5    1


Column 2:
E32 <- t(Matrix(c(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 1.0), nrow=3))  # Step 3, find the corresponding matrix!
E32
3 x 3 sparse Matrix of class "dtCMatrix"
          
[1,] 1 . .
[2,] . 1 .
[3,] . 5 1
# zero (3, 2) entry
E32 %*% E31 %*% E21 %*% A   # Step 3, A'''
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    0   -1    3
[3,]    0    0   16

Thus we have arrived at an upper triangular matrix $\mathbf{U}$.

Gauss transformations

M1 <- E31  # inv(L2)
M1
3 x 3 sparse Matrix of class "dtCMatrix"
           
[1,]  1 . .
[2,]  . 1 .
[3,] -1 . 1
M2 <- E32  # inv(L3)
M2
3 x 3 sparse Matrix of class "dtCMatrix"
          
[1,] 1 . .
[2,] . 1 .
[3,] . 5 1
solve(M1)   # L2; solve(A) gives the inverse of A, but use it with caution (see below)
3 x 3 sparse Matrix of class "dtCMatrix"
          
[1,] 1 . .
[2,] . 1 .
[3,] 1 . 1
solve(M2)   # L3
3 x 3 sparse Matrix of class "dtCMatrix"
           
[1,] 1  . .
[2,] .  1 .
[3,] . -5 1

LU decomposition

Gaussian elimination does \(\mathbf{M}_{n-1} \cdots \mathbf{M}_1 \mathbf{A} = \mathbf{U}.\)
Let

\begin{equation} \mathbf{L} = \mathbf{M}_1^{-1} \cdots \mathbf{M}_{n-1}^{-1} = \begin{pmatrix}
1 & & & &
- c_{21} & \ddots & & &
& & 1 & &
- c_{k+1,1} & & - c_{k+1,k} & &
& & \vdots & & \ddots
- c_{n1} & & - c_{nk} & & & 1 \end{pmatrix}. \end{equation
} Thus we have the LU decomposition \(\mathbf{A} = \mathbf{L} \mathbf{U},\) where $\mathbf{L}$ is unit lower triangular and $\mathbf{U}$ is upper triangular.

Source: http://www.netlib.org

Matrix inversion

Pivoting

\[\begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.001001 \\ 0.998999 \end{bmatrix} \approx \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} .\]

The condition number of matrix $\mathbf{A} = \begin{bmatrix} 0.001 & 1 \ 1 & 1 \end{bmatrix}$ is 2.262, so the problem itself is well-conditioned: a small change in $\mathbf{A}$ won’t yield a large change in the solution $\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}$. It is the characteristic of the algorithm that yields the large error between the true solution $\mathbf{x}\approx(1, 1)$ and the computed solution (or the output of the algorithm) $\hat{\mathbf{x}}=(0, 1)$. This example suggests that with GE/LU, the worst-case relative error $|\mathbf{x}-\hat{\mathbf{x}}| / |\mathbf{x}| \ge 1/\sqrt{2}$, not small.

The root cause is the vastly different scale of $a_{11}$ with other coefficients and the order of ellimination. In particular, any $a_{22}$ such that $[a_{22} - 1000] = -1000$ will yield the same solution $\hat{\mathbf{x}}=(0, 1)$. (Recall Scenario 1 of Catastrophic Cancellation in Lecture 1.) The $1000 = a_{22}/a_{11}$ causes the loss of information in $a_{22}$ and $b_2$.

Indeed, the LU decomposition of $\mathbf{A}$ within the precision is

\(\mathbf{L} = \begin{bmatrix} 1 & 0 \\ 1000 & 1 \end{bmatrix}, \quad \mathbf{U} = \begin{bmatrix} 0.001 & 1 \\ 0 & -1000 \end{bmatrix},\) resulting in $\mathbf{L}\mathbf{U} = \begin{bmatrix} 0.001 & 1 \ 1 & 0 \end{bmatrix}$, and $|\mathbf{A}-\mathbf{L}\mathbf{U}| = 1 \approx |\mathbf{A}|=1.62$.

\[\begin{split} x_1 + x_2 &= 2 \\ 0.001x_1 + x_2 &= 1 \end{split}\]

\(\begin{split} x_1 + x_2 &= 2 \\ (1-0.001)x_2 &= 1 - 0.002 \end{split}\) or \(\begin{split} x_1 + x_2 &= 2 \\ 1.0 x_2 &= 1.0 \end{split} ,\) yielding \(\begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} .\)

Note that $0.001=a_{11}/a_{22}$ does not cause a loss of information in $a_{12}$ and $b_1$. The LU decomposition within the precision is \(\mathbf{L} = \begin{bmatrix} 1 & 0 \\ 0.001 & 1 \end{bmatrix}, \quad \mathbf{U} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}, \quad \mathbf{L}\mathbf{U} = \begin{bmatrix} 1 & 1 \\ 0.001 & 1.0 \end{bmatrix}.\)

Implementation

LAPACK (Linear Algebra Package) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. -Wikipedia

A  # just to recall
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    4   -9    7
[3,]    2    1    3
# second argument indicates partial pivoting (default) or not
alu <- Matrix::lu(A)
class(alu)

‘denseLU’

ealu <- expand(alu)
ealu$L
3 x 3 Matrix of class "dtrMatrix" (unitriangular)
     [,1]       [,2]       [,3]      
[1,] 1.00000000          .          .
[2,] 0.50000000 1.00000000          .
[3,] 0.50000000 0.09090909 1.00000000
ealu$U
3 x 3 Matrix of class "dtrMatrix"
     [,1]      [,2]      [,3]     
[1,]  4.000000 -9.000000  7.000000
[2,]         .  5.500000 -0.500000
[3,]         .         . -1.454545
ealu$P
3 x 3 sparse Matrix of class "pMatrix"
          
[1,] . . |
[2,] | . .
[3,] . | .
with(ealu, L %*% U)
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    4   -9    7
[2,]    2    1    3
[3,]    2   -4    2
with(ealu, P %*% L %*% U)
3 x 3 Matrix of class "dgeMatrix"
     [,1] [,2] [,3]
[1,]    2   -4    2
[2,]    4   -9    7
[3,]    2    1    3
det(A) # this does LU!

-32

det(ealu$U) # this is cheap

-32

solve(A) # this does LU!
3 x 3 Matrix of class "dgeMatrix"
        [,1]    [,2]   [,3]
[1,]  1.0625 -0.4375 0.3125
[2,] -0.0625 -0.0625 0.1875
[3,] -0.6875  0.3125 0.0625
with(ealu, solve(U) %*% solve(L) %*% t(P) ) # this is cheap???
3 x 3 Matrix of class "dgeMatrix"
        [,1]    [,2]   [,3]
[1,]  1.0625 -0.4375 0.3125
[2,] -0.0625 -0.0625 0.1875
[3,] -0.6875  0.3125 0.0625