I'm trying to plot error bands around a linear regression. I'm working with the builtin trees
dataset in R. Here's my code. No lines are showing up on the plot. Please help!
xdev <- log(Volume) - mean(log(Volume))
xdev
ydev <- Girth - mean(Girth)
ydev
b1 <- sum((xdev)*(ydev))/sum(xdev^2)
b0 <- mean(Girth) - mean(log(Volume))*b1
plot(log(Volume) ~ Girth)
abline(coef(lm(log(Volume) ~ Girth)), col=2)
(paste(b0, b1))
y.hat <- b0 b1*log(Volume)
ss.explained <- sum((y.hat - mean(Girth))^2)
ss.unexplained <- sum((y.hat - Girth)^2)
stderr.b1 <- sqrt((ss.unexplained/29)/sum(xdev^2))
stderr.b1
std.yhat <- sqrt((ss.unexplained/29)*((1/31) (xdev^2/sum(xdev^2))))
std.yhat
upp <- y.hat qt(0.95, 29)*std.yhat
low <- y.hat - qt(0.95, 29)*std.yhat
upp
low
library(lattice)
plot(log(Volume) ~ Girth, data = trees)
abline(c(b0, b1), lty=1)
points(upp ~ Girth, data=trees, type="l", lty=2)
points(low ~ Girth, data=trees, type="l", lty=2)
CodePudding user response:
Obviously you want to calculate OLS by hand as well as predictions with confidence bands. Since I'm not familiar with the approach you are using (but you definitely mixed up Y and X in the beginning of your code), I show you how to get the confidence bands with a similar manual approach I'm more familiar with.
data(trees) ## load trees dataset
## fit
X <- cbind(1, trees$Girth)
y <- log(trees$Volume)
beta <- solve(t(X) %*% X) %*% t(X) %*% y
y.hat <- as.vector(X %*% beta)
## std. err. y.hat
res <- y - y.hat
n <- nrow(trees); k <- ncol(X)
VCV <- 1/(n - k)*as.vector(t(res) %*% res)*solve(t(X) %*% X)
se.yhat <- sqrt(diag(X %*% VCV %*% t(X)))
## CIs
tcrit <- qt(0.975, n - k)
upp <- y.hat tcrit*se.yhat
low <- y.hat - tcrit*se.yhat
## plot
with(trees, plot(x=Girth, y=log(Volume)))
abline(a=beta, col=2)
lines(x=trees$Girth, y=upp, lty=2)
lines(x=trees$Girth, y=low, lty=2)
We can compare this with the results of respective R functions, which will give us the same values and, thus, the same plot as above.
fit <- lm(log(Volume) ~ Girth, trees)
pred <- stats::predict.lm(fit, se.fit=TRUE, df=fit$df.residual)
with(trees, plot(x=Girth, y=log(Volume)))
abline(fit, col=2)
lines(x=trees$Girth, y=pred$fit tcrit*pred$se.fit, lty=2)
lines(x=trees$Girth, y=pred$fit - tcrit*pred$se.fit, lty=2)
Since you have noted lattice
, you can further crosscheck using lattice::xyplot
:
lattice::xyplot(fit$fitted.values ~ log(trees$Volume),
panel=mosaic::panel.lmbands)
Note: It seems you are using attach
, which is discouraged in the community, see: Why is it not advisable to use attach() in R, and what should I use instead?. I therefore used with()
and `$`()
instead.