[통계계산] 3. Cholesky Decomposition

Computational Statistics

Featured image

[통계계산] 3. Cholesky Decomposition


목차

  1. Computer Arithmetic
  2. LU decomposition
  3. Cholesky Decomposition
  4. QR Decomposition
  5. Interative Methods
  6. Non-linear Equations
  7. Optimization
  8. Numerical Integration
  9. Random Number Generation
  10. Monte Carlo Integration


sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6 mvtnorm_1.0-10       Matrix_1.2-17       

loaded via a namespace (and not attached):
 [1] lattice_0.20-44   fansi_0.5.0       digest_0.6.27     utf8_1.2.1       
 [5] crayon_1.4.1      IRdisplay_1.0     grid_3.6.3        repr_1.1.3       
 [9] lifecycle_1.0.0   jsonlite_1.7.2    evaluate_0.14     pillar_1.6.1     
[13] rlang_0.4.11      uuid_0.1-4        vctrs_0.3.8       ellipsis_0.3.2   
[17] IRkernel_1.1      tools_3.6.3       compiler_3.6.3    base64enc_0.1-3  
[21] pbdZMQ_0.3-5      htmltools_0.5.1.1

Required R packages

library(Matrix)
library(mvtnorm)
library(microbenchmark)

Cholesky Decomposition

The structure should be exploited whenever possible in solving a problem.

Common structures include: symmetry, positive (semi)definiteness, sparsity, Kronecker product, low rank, …

Cholesky decomposition

Source: http://www.netlib.org

Pivoting

Implementation

Example: positive definite matrix

A <- t(Matrix(c(4, 12, -16, 12, 37, -43, -16, -43, 98), nrow=3))  # check out this is pd!
A
3 x 3 Matrix of class "dsyMatrix"
     [,1] [,2] [,3]
[1,]    4   12  -16
[2,]   12   37  -43
[3,]  -16  -43   98
# Cholesky without pivoting
Achol <- Matrix::chol(Matrix::symmpart(A))
Achol
3 x 3 Matrix of class "Cholesky"
     [,1] [,2] [,3]
[1,]    2    6   -8
[2,]    .    1    5
[3,]    .    .    3
class(Achol)

‘Cholesky’

getSlots("Cholesky")

<dl class=dl-inline><dt>x</dt><dd>‘numeric’</dd><dt>Dim</dt><dd>‘integer’</dd><dt>Dimnames</dt><dd>‘list’</dd><dt>uplo</dt><dd>‘character’</dd><dt>diag</dt><dd>‘character’</dd></dl>

str(Achol)
Formal class 'Cholesky' [package "Matrix"] with 5 slots
  ..@ x       : num [1:9] 2 0 0 6 1 0 -8 5 3
  ..@ Dim     : int [1:2] 3 3
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ diag    : chr "N"
b <- c(1.0, 2.0, 3.0)
solve(A, b) # this does LU; wasteful!; 2/3 n^3 + 2n^2
3 x 1 Matrix of class "dgeMatrix"
          [,1]
[1,] 28.583333
[2,] -7.666667
[3,]  1.333333
y <- solve(t(Achol), b) # two triangular solves; only 2n^2 flops
solve(Achol, y)
3 x 1 Matrix of class "dgeMatrix"
          [,1]
[1,] 28.583333
[2,] -7.666667
[3,]  1.333333
det(A) # this actually does LU; wasteful!

36.0000000000001

det(Achol)^2 # cheap

36

solve(A) # this does LU!
3 x 3 Matrix of class "dsyMatrix"
           [,1]        [,2]       [,3]
[1,]  49.361111 -13.5555556  2.1111111
[2,] -13.555556   3.7777778 -0.5555556
[3,]   2.111111  -0.5555556  0.1111111
solve(Achol, solve(t(Achol))) # 2n triangular solves
3 x 3 Matrix of class "dgeMatrix"
           [,1]        [,2]       [,3]
[1,]  49.361111 -13.5555556  2.1111111
[2,] -13.555556   3.7777778 -0.5555556
[3,]   2.111111  -0.5555556  0.1111111

Example: positive semidefinite matrix

(A <- Matrix(c(1,1,.5,1,1,.5,.5,.5,1), nrow=3))
eigen(A, symmetric = TRUE)$values
3 x 3 Matrix of class "dsyMatrix"
     [,1] [,2] [,3]
[1,]  1.0  1.0  0.5
[2,]  1.0  1.0  0.5
[3,]  0.5  0.5  1.0

<ol class=list-inline><li>2.36602540378444</li><li>0.633974596215564</li><li>0</li></ol>

Matrix::chol(A, pivot=FALSE) # 2nd argument requests pivoting
Error in asMethod(object): not a positive definite matrix
Traceback:


1. Matrix::chol(A, pivot = FALSE)

2. Matrix::chol(A, pivot = FALSE)

3. .local(x, ...)

4. as(x, if (packed) "dppMatrix" else "dpoMatrix")

5. asMethod(object)

6. stop("not a positive definite matrix")
(Achol <- base::chol(A, pivot=TRUE)) # turn off checking pd
attr(Achol, "pivot")
attr(Achol, "rank")
Warning message in chol.default(A, pivot = TRUE):
“the matrix is either rank-deficient or indefinite”
A matrix: 3 × 3 of type dbl
10.50000001
00.86602540
00.00000000

<ol class=list-inline><li>1</li><li>3</li><li>2</li></ol>

2

Matrix::rankMatrix(A) # determine rank from SVD, which is more numerically stable

2

Achol[, attr(Achol, "pivot")]
A matrix: 3 × 3 of type dbl
110.5000000
000.8660254
000.0000000
(P <- as(attr(Achol, "pivot"), "pMatrix"))   # permutation matrix, actually P' in our notation
3 x 3 sparse Matrix of class "pMatrix"
          
[1,] | . .
[2,] . . |
[3,] . | .
# P A P' = L U
Matrix::norm(t(P) %*% A %*% P - t(Achol) %*% Achol, type="F")  # Frobenius norm

1.11022302462516e-16

Applications

Multivariate normal density

Multivariate normal density $\mathcal{N}(0, \Sigma)$, where $\Sigma$ is $n\times n$ positive definite, is \(\frac{1}{(2\pi)^{n/2}\det(\Sigma)^{1/2}}\exp\left(-\frac{1}{2}\mathbf{y}^T\Sigma^{-1}\mathbf{y}\right).\)

Hence the log likelihood is $$

Which method is better?

# word-by-word transcription of mathematical expression
# Matrix::determinant computes the logarithm of the determinant
logpdf_mvn_1 <- function(y, Sigma) {
    n <- length(y)
    - (n / 2) * log(2 * pi) - (1 / 2) * Matrix::determinant(Sigma)$modulus - (1 / 2) * t(y) %*% solve(Sigma) %*% y
}

# you learnt why you should avoid inversion
logpdf_mvn_2 <- function(y, Sigma) {
    n <- length(y)
    Sigmachol <- Matrix::chol(Matrix::symmpart(Sigma))
    - (n / 2) * log(2 * pi) - Matrix::determinant(Sigmachol)$modulus - (1 / 2) * sum(y * solve(Sigma, y))
}

# this is for the efficiency-concerned  
logpdf_mvn_3 <- function(y, Sigma) {
    n <- length(y)
    Sigmachol <- Matrix::chol(Matrix::symmpart(Sigma))
    - 0.5 * n * log(2 * pi) - sum(log(diag(Sigmachol))) - 0.5 * sum(abs(solve(t(Sigmachol), y))^2)
}
set.seed(123) # seed

n <- 1000
# a pd matrix
Sigma_half <- Matrix(rnorm(n * (2 * n)), nrow=n)
Sigma <- Sigma_half %*% t(Sigma_half)
Sigma <- Matrix::symmpart(Sigma)
Sigmachol <- Matrix::chol(Sigma)

y <- Sigmachol %*% rnorm(n)  # one random sample from N(0, Sigma). Homework

# at least they give same answer
(logpdf_mvn_1(y, Sigma))
(logpdf_mvn_2(y, Sigma))
(logpdf_mvn_3(y, Sigma))
1 x 1 Matrix of class "dgeMatrix"
          [,1]
[1,] -5272.565

-5272.56471529637

-5272.56471529637

res <- microbenchmark::microbenchmark(logpdf_mvn_1(y, Sigma), logpdf_mvn_2(y, Sigma), logpdf_mvn_3(y, Sigma))
print(res)
Unit: milliseconds
                   expr      min       lq     mean   median       uq      max
 logpdf_mvn_1(y, Sigma) 405.5765 416.8209 442.0820 425.4402 441.0190 572.9140
 logpdf_mvn_2(y, Sigma) 164.7256 168.7226 178.1795 171.7639 176.7348 307.7929
 logpdf_mvn_3(y, Sigma) 169.3317 176.3511 188.6951 180.4394 191.4759 311.4974
 neval cld
   100   b
   100  a 
   100  a 

Linear regression by Cholesky

Total computational cost is $np^2 + (1/3) p^3$ (without s.e.) or $np^2 + (5/3) p^3$ (with s.e.) flops.