How is the result of the vcov() function in R computed?-CodePudding

I wanted to know how I can manually compute what is returned when I call the vcov() function in R on a lm object, i.e. the variance-covariance matrix.

The diagonal entries are very straightforward, one just needs to take the st. error of every parameters and square it. For the estimates under for example, entry (z,z) of the variance-covariance matrix is simply computed by taking z's estimated coefficient's standard error (0.1096658) and squaring it. But what about the non-diagonal entries? Could someone provide the code to manually compute these?

# Minimal working example code
library(data.table)
df <- data.table(y = runif(100, 0, 100),
                 x = runif(100, 0, 100),
                 z = runif(100, 0, 100))
coef(lm(y ~ x   z, df)) # coefficients
coef(summary(lm(y ~ x   z, df)))[,2] # standard errors
vcov(lm(y ~ x   z, df)) # variance-covariance matrix

# Example result
coef(lm(y ~ x   z, df))
(Intercept)           x           z 
54.44460066  0.03052479 -0.11197309 

coef(summary(lm(y ~ x   z, df)))[,2]
(Intercept)           x           z 
  8.4668880   0.1077858   0.1096658 

vcov(lm(y ~ x   z, df))
            (Intercept)            x            z
(Intercept)  71.6881931 -0.576781121 -0.585380616
x            -0.5767811  0.011617776 -0.001059506
z            -0.5853806 -0.001059506  0.012026594

CodePudding user response：

Looking at the details of the function it seems the be the unscaled covariance times sigma squared (in case complete cases is false):

suppressMessages(library(data.table))
df <- data.table(y = runif(100, 0, 100),
                 x = runif(100, 0, 100),
                 z = runif(100, 0, 100))
stats:::vcov.summary.lm
#> function (object, complete = TRUE, ...) 
#> .vcov.aliased(object$aliased, object$sigma^2 * object$cov.unscaled, 
#>     complete = complete)
#> <bytecode: 0x5619b28eec88>
#> <environment: namespace:stats>
stats:::.vcov.aliased
#> function (aliased, vc, complete = TRUE) 
#> {
#>     if (complete && NROW(vc) < (P <- length(aliased)) && any(aliased)) {
#>         cn <- names(aliased)
#>         VC <- matrix(NA_real_, P, P, dimnames = list(cn, cn))
#>         j <- which(!aliased)
#>         VC[j, j] <- vc
#>         VC
#>     }
#>     else vc
#> }
#> <bytecode: 0x5619b3644430>
#> <environment: namespace:stats>
vcov(mod<-lm(y ~ x   z, df))
#>             (Intercept)             x             z
#> (Intercept)  54.1734049 -0.4767886045 -0.4498457055
#> x            -0.4767886  0.0089655548  0.0005973927
#> z            -0.4498457  0.0005973927  0.0082190981
summary(mod)$cov.unscaled* summary(mod)$sigma^2
#>             (Intercept)             x             z
#> (Intercept)  54.1734049 -0.4767886045 -0.4498457055
#> x            -0.4767886  0.0089655548  0.0005973927
#> z            -0.4498457  0.0005973927  0.0082190981

^{Created on 2021-10-01 by the reprex package (v2.0.1)}

CodePudding user response：

For a "manual computation" you can check this


library(data.table)
df     = data.table(y = runif(100, 0, 100),
                    x = runif(100, 0, 100),
                    z = runif(100, 0, 100))
mod    = lm(y ~ ., df)
X      = cbind(1, as.matrix(df[, -1]))
invXtX = solve(crossprod(X))
coef   = invXtX %*% t(X) %*% df$y
resid  = df$y - X %*% coef
df.res = nrow(X) - ncol(X)
manual = invXtX * sum(resid**2)/(df.res)
funct  = vcov(mod)
all.equal(manual, funct, check.attributes = F)# should be TRUE