I wanted to know how I can manually compute what is returned when I call the vcov() function in R on a lm object, i.e. the variance-covariance matrix.
The diagonal entries are very straightforward, one just needs to take the st. error of every parameters and square it. For the estimates under for example, entry (z,z) of the variance-covariance matrix is simply computed by taking z's estimated coefficient's standard error (0.1096658) and squaring it. But what about the non-diagonal entries? Could someone provide the code to manually compute these?
# Minimal working example code
library(data.table)
df <- data.table(y = runif(100, 0, 100),
x = runif(100, 0, 100),
z = runif(100, 0, 100))
coef(lm(y ~ x z, df)) # coefficients
coef(summary(lm(y ~ x z, df)))[,2] # standard errors
vcov(lm(y ~ x z, df)) # variance-covariance matrix
# Example result
coef(lm(y ~ x z, df))
(Intercept) x z
54.44460066 0.03052479 -0.11197309
coef(summary(lm(y ~ x z, df)))[,2]
(Intercept) x z
8.4668880 0.1077858 0.1096658
vcov(lm(y ~ x z, df))
(Intercept) x z
(Intercept) 71.6881931 -0.576781121 -0.585380616
x -0.5767811 0.011617776 -0.001059506
z -0.5853806 -0.001059506 0.012026594
CodePudding user response:
Looking at the details of the function it seems the be the unscaled covariance times sigma squared (in case complete cases is false):
suppressMessages(library(data.table))
df <- data.table(y = runif(100, 0, 100),
x = runif(100, 0, 100),
z = runif(100, 0, 100))
stats:::vcov.summary.lm
#> function (object, complete = TRUE, ...)
#> .vcov.aliased(object$aliased, object$sigma^2 * object$cov.unscaled,
#> complete = complete)
#> <bytecode: 0x5619b28eec88>
#> <environment: namespace:stats>
stats:::.vcov.aliased
#> function (aliased, vc, complete = TRUE)
#> {
#> if (complete && NROW(vc) < (P <- length(aliased)) && any(aliased)) {
#> cn <- names(aliased)
#> VC <- matrix(NA_real_, P, P, dimnames = list(cn, cn))
#> j <- which(!aliased)
#> VC[j, j] <- vc
#> VC
#> }
#> else vc
#> }
#> <bytecode: 0x5619b3644430>
#> <environment: namespace:stats>
vcov(mod<-lm(y ~ x z, df))
#> (Intercept) x z
#> (Intercept) 54.1734049 -0.4767886045 -0.4498457055
#> x -0.4767886 0.0089655548 0.0005973927
#> z -0.4498457 0.0005973927 0.0082190981
summary(mod)$cov.unscaled* summary(mod)$sigma^2
#> (Intercept) x z
#> (Intercept) 54.1734049 -0.4767886045 -0.4498457055
#> x -0.4767886 0.0089655548 0.0005973927
#> z -0.4498457 0.0005973927 0.0082190981
Created on 2021-10-01 by the reprex package (v2.0.1)
CodePudding user response:
For a "manual computation" you can check this
library(data.table)
df = data.table(y = runif(100, 0, 100),
x = runif(100, 0, 100),
z = runif(100, 0, 100))
mod = lm(y ~ ., df)
X = cbind(1, as.matrix(df[, -1]))
invXtX = solve(crossprod(X))
coef = invXtX %*% t(X) %*% df$y
resid = df$y - X %*% coef
df.res = nrow(X) - ncol(X)
manual = invXtX * sum(resid**2)/(df.res)
funct = vcov(mod)
all.equal(manual, funct, check.attributes = F)# should be TRUE