[통계계산] 4. QR Decomposition

Computational Statistics

Featured image

[통계계산] 4. QR Decomposition


목차

  1. Computer Arithmetic
  2. LU decomposition
  3. Cholesky Decomposition
  4. QR Decomposition
  5. Interative Methods
  6. Non-linear Equations
  7. Optimization
  8. Numerical Integration
  9. Random Number Generation
  10. Monte Carlo Integration


sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] fansi_0.5.0       digest_0.6.27     utf8_1.2.1        crayon_1.4.1     
 [5] IRdisplay_1.0     repr_1.1.3        lifecycle_1.0.0   jsonlite_1.7.2   
 [9] evaluate_0.14     pillar_1.6.1      rlang_0.4.11      uuid_0.1-4       
[13] vctrs_0.3.8       ellipsis_0.3.2    IRkernel_1.1      tools_3.6.3      
[17] compiler_3.6.3    base64enc_0.1-3   pbdZMQ_0.3-5      htmltools_0.5.1.1

Required R packages

library(Matrix)

QR Decomposition

set.seed(2020) # seed

n <- 5
p <- 3
(X <- matrix(rnorm(n * p), nrow=n)) # predictor matrix
(y <- rnorm(n))    # response vector

# find the (minimum L2 norm) least squares solution
lm(y ~ X - 1)
A matrix: 5 × 3 of type dbl
0.3769721 0.7205735-0.8531228
0.3015484 0.9391210 0.9092592
-1.0980232-0.2293777 1.1963730
-1.1304059 1.7591313-0.3715839
-2.7965343 0.1173668-0.1232602

<ol class=list-inline><li>1.80004311672545</li><li>1.70399587729432</li><li>-3.03876460529759</li><li>-2.28897494991878</li><li>0.0583034949929225</li></ol>

Call:
lm(formula = y ~ X - 1)

Coefficients:
       X1         X2         X3  
 0.615118  -0.008382  -0.770116  

We want to understand what is QR and how it is used for solving least squares problem.

Definitions

Least squares

Gram-Schmidt procedure

GS conducts reduced QR

Source: https://dsc-spidal.github.io/harp/docs/harpdaal/algorithms/

Classical Gram-Schmidt

cgs <- function(X) {  # not in-place
    n <- dim(X)[1]
    p <- dim(X)[2]    
    Q <- Matrix(0, nrow=n, ncol=p, sparse=FALSE)
    R <- Matrix(0, nrow=p, ncol=p, sparse=FALSE)
    for (j in seq_len(p)) {
        Q[, j] <- X[, j]
        for (i in seq_len(j-1)) {
            R[i, j] = sum(Q[, i] * X[, j])  # dot product
            Q[, j] <- Q[, j] - R[i, j] * Q[, i]
        }
        R[j, j] <- Matrix::norm(Q[, j, drop=FALSE], "F")
        Q[, j] <- Q[, j] / R[j, j]
    }
    list(Q=Q, R=R)
}
e <- .Machine$double.eps
(A <- t(Matrix(c(1, 1, 1, e, 0, 0, 0, e, 0, 0, 0, e), nrow=3)))
4 x 3 Matrix of class "dgeMatrix"
             [,1]         [,2]         [,3]
[1,] 1.000000e+00 1.000000e+00 1.000000e+00
[2,] 2.220446e-16 0.000000e+00 0.000000e+00
[3,] 0.000000e+00 2.220446e-16 0.000000e+00
[4,] 0.000000e+00 0.000000e+00 2.220446e-16
res <- cgs(A)
res$Q
4 x 3 Matrix of class "dgeMatrix"
             [,1]       [,2]       [,3]
[1,] 1.000000e+00  0.0000000  0.0000000
[2,] 2.220446e-16 -0.7071068 -0.7071068
[3,] 0.000000e+00  0.7071068  0.0000000
[4,] 0.000000e+00  0.0000000  0.7071068
with(res, t(Q) %*% Q)
3 x 3 Matrix of class "dgeMatrix"
              [,1]          [,2]          [,3]
[1,]  1.000000e+00 -1.570092e-16 -1.570092e-16
[2,] -1.570092e-16  1.000000e+00  5.000000e-01
[3,] -1.570092e-16  5.000000e-01  1.000000e+00

Modified Gram-Schmidt

mgs <- function(X) { # not in-place
    n <- dim(X)[1]
    p <- dim(X)[2]
    R <- Matrix(0, nrow=p, ncol=p)
    for (j in seq_len(p)) {
        for (i in seq_len(j-1)) {
            R[i, j] <- sum(X[, i] * X[, j])  # dot product
            X[, j] <- X[, j] - R[i, j] * X[, i]
        }
        R[j, j] <- Matrix::norm(X[, j, drop=FALSE], "F")
        X[, j] <- X[, j] / R[j, j]
    }
    list(Q=X, R=R)
}
mres <- mgs(A)
mres$Q
4 x 3 Matrix of class "dgeMatrix"
             [,1]       [,2]       [,3]
[1,] 1.000000e+00  0.0000000  0.0000000
[2,] 2.220446e-16 -0.7071068 -0.4082483
[3,] 0.000000e+00  0.7071068 -0.4082483
[4,] 0.000000e+00  0.0000000  0.8164966
with(mres, t(Q) %*% Q)
3 x 3 Matrix of class "dgeMatrix"
              [,1]          [,2]          [,3]
[1,]  1.000000e+00 -1.570092e-16 -9.064933e-17
[2,] -1.570092e-16  1.000000e+00  1.110223e-16
[3,] -9.064933e-17  1.110223e-16  1.000000e+00
(B <- t(Matrix(c(0.7, 1/sqrt(2), 0.7 + e, 1/sqrt(2)), nrow=2)))
2 x 2 Matrix of class "dgeMatrix"
     [,1]      [,2]
[1,]  0.7 0.7071068
[2,]  0.7 0.7071068
mres2 <- mgs(B)
mres2$Q
2 x 2 Matrix of class "dgeMatrix"
          [,1] [,2]
[1,] 0.7071068    1
[2,] 0.7071068    0
with(mres2, t(Q) %*% Q)
2 x 2 Matrix of class "dgeMatrix"
          [,1]      [,2]
[1,] 1.0000000 0.7071068
[2,] 0.7071068 1.0000000

QR by Householder transform

Alston Scott Householder (1904-1993)

Algorithm

for (j in seq_len(p)) {  # in-place
    u <- Householer(X[j:n, j])
    for (i in j - 1 + seq_len(max(p - j + 1, 0))
        X[j:n, j:p] <- X[j:n, j:p] - 2 * u * sum(u * X[j:n, j:p])
    }
}

Implementation

X  # to recall
y
A matrix: 5 × 3 of type dbl
0.3769721 0.7205735-0.8531228
0.3015484 0.9391210 0.9092592
-1.0980232-0.2293777 1.1963730
-1.1304059 1.7591313-0.3715839
-2.7965343 0.1173668-0.1232602

<ol class=list-inline><li>1.80004311672545</li><li>1.70399587729432</li><li>-3.03876460529759</li><li>-2.28897494991878</li><li>0.0583034949929225</li></ol>

coef(lm(y ~ X - 1)) # least squares solution by QR

<dl class=dl-inline><dt>X1</dt><dd>0.615117663816443</dd><dt>X2</dt><dd>-0.00838211603686755</dd><dt>X3</dt><dd>-0.770116370364976</dd></dl>

# same as
qr.solve(X, y)  # or solve(qr(X), y)

<ol class=list-inline><li>0.615117663816443</li><li>-0.00838211603686755</li><li>-0.770116370364976</li></ol>

(qrobj <- qr(X, LAPACK=TRUE))
$qr
           [,1]        [,2]       [,3]
[1,] -3.2460924  0.46519442  0.1837043
[2,]  0.0832302 -2.08463446  0.3784090
[3,] -0.3030648 -0.05061826 -1.7211061
[4,] -0.3120027  0.61242637 -0.4073016
[5,] -0.7718699  0.10474144 -0.3750994

$rank
[1] 3

$qraux
[1] 1.116131 1.440301 1.530697

$pivot
[1] 1 2 3

attr(,"useLAPACK")
[1] TRUE
attr(,"class")
[1] "qr"

The upper triangular part of qrobj$qr contains the $\mathbf{R}$ of the decomposition and the lower triangular part contains information on the $\mathbf{Q}$ of the decomposition (stored in compact form using $\mathbf{Q} = \mathbf{H}_1 \cdots \mathbf{H}_p$, HW).

qr.Q(qrobj)  # extract Q
A matrix: 5 × 3 of type dbl
-0.11613105-0.3715745 0.40159171
-0.09289581-0.4712268-0.64182039
0.33825998 0.1855167-0.61822569
0.34823590-0.7661458 0.08461993
0.86150792 0.1359480 0.19346097
qr.R(qrobj) # extract R
A matrix: 3 × 3 of type dbl
-3.246092 0.4651944 0.1837043
0.000000-2.0846345 0.3784090
0.000000 0.0000000-1.7211061
qr.qty(qrobj, y)  # this uses the "compact form"
A matrix: 5 × 1 of type dbl
-2.1421018
-0.2739453
1.3254520
-3.3263425
1.7707709
t(qr.Q(qrobj)) %*% y  # waste of resource
A matrix: 3 × 1 of type dbl
-2.1421018
-0.2739453
1.3254520
solve(qr.R(qrobj), qr.qty(qrobj, y)[1:p])  # solves R * beta = Q' * y

<ol class=list-inline><li>0.615117663816443</li><li>-0.0083821160368676</li><li>-0.770116370364976</li></ol>

Xchol <- Matrix::chol(Matrix::symmpart(t(X) %*% X))  # solution using Cholesky of X'X
solve(Xchol, solve(t(Xchol), t(X) %*% y))
A matrix: 3 × 1 of type dbl
0.615117664
-0.008382116
-0.770116370

Lets get back to the odd A and B matrices that fooled Gram-Schmidt.

qrA <- qr(A, LAPACK=TRUE)
qr.Q(qrA)
A matrix: 4 × 3 of type dbl
-1.000000e+00 1.570092e-16 9.064933e-17
-2.220446e-16-7.071068e-01-4.082483e-01
0.000000e+00 7.071068e-01-4.082483e-01
0.000000e+00 0.000000e+00 8.164966e-01
t(qr.Q(qrA)) %*% qr.Q(qrA)   # orthogonality preserved
A matrix: 3 × 3 of type dbl
1 0.000000e+00 0.000000e+00
0 1.000000e+00-1.110223e-16
0-1.110223e-16 1.000000e+00
qrB <- qr(B, LAPACK=TRUE)
qr.Q(qrB)
A matrix: 2 × 2 of type dbl
-0.7071068-0.7071068
-0.7071068 0.7071068
t(qr.Q(qrB)) %*% qr.Q(qrB)  # orthogonality preserved
A matrix: 2 × 2 of type dbl
1.000000e+00-1.110223e-16
-1.110223e-16 1.000000e+00

QR by Givens rotation

Conditioning of linear regression by least squares

\[\theta = \cos^{-1}\frac{\|\mathbf{X}\boldsymbol{\beta}\|}{\|\boldsymbol{\beta}\|}, \quad \eta = \frac{\|\mathbf{X}\|\|\boldsymbol{\beta}\|}{\|\mathbf{X}\boldsymbol{\beta}\|} .\]

Summary of linear regression

Methods for solving linear regression $\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:

Method Flops Remarks Software Stability
Cholesky $np^2 + p^3/3$     unstable
QR by Householder $2np^2 - (2/3)p^3$   R, Julia, MATLAB stable
QR by MGS $2np^2$ $Q_1$ available   stable
QR by SVD $4n^2p + 8np^2 + 9p^3$ $X = UDV^T$   most stable

Remarks:

  1. When $n \gg p$, Cholesky is twice faster than QR and need less space.
  2. Cholesky is based on the Gram matrix $\mathbf{X}^T \mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.
  3. QR methods are more stable and produce numerically more accurate solution.
  4. MGS appears slower than Householder, but it yields $\mathbf{Q}_1$.

There is simply no such thing as a universal ‘gold standard’ when it comes to algorithms.

Underdetermined least squares