enter code here
This is the dataset:
heartData <- structure(list(id = 1:6, biking = c(30.80124571, 65.12921517,
1.959664531, 44.80019562, 69.42845368, 54.40362555), smoking = c(10.89660802,
2.219563176, 17.58833051, 2.802558875, 15.9745046, 29.33317552
), heart.disease = c(11.76942278, 2.854081478, 17.17780348, 6.816646909,
4.062223522, 9.550045997)), row.names = c(NA, 6L), class = "data.frame")
Here I have used multiple linear regression as model.
model.1 <- lm( heart.disease ~ biking smoking, data = heartData)
plotting.data is a synthesized data I am interested in to check the confidence interval around as well as prediction interval.
plotting.data <- expand.grid(
biking = seq(min(heartData$biking), max(heartData$biking), length.out = 5),
smoking = c(mean(heartData$smoking)))
plotting.data$predicted.y <- predict(model.1, newdata = plotting.data, interval = 'confidence')
plotting.data$smoking <- round(plotting.data$smoking, digits = 2)
plotting.data$smoking <- as.factor(plotting.data$smoking)
After running the block of code above, I can see I have created plotting.data with 5 columns however, when I'm running
colnames(plotting.data)
I get 3 column names. plotting.data$predicted.y
is only one column and I can't have access or rename plotting.data$predicted.y[,"fit"], plotting.data$predicted.y[,"upr"]
or plotting.data$predicted.y[,"lwr"]
To plot results
heart.plot <- ggplot(data = heartData, aes(x = biking, y = heart.disease)) geom_point()
geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"], color = "red"), size = 1.25)
geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
heart.plot
I get the error:
Error in FUN(X[[i]], ...) : object 'heart.disease' not found
I don't know why I'm getting this error. From my own trial and errors, I know that the following part of the code is giving the error. however, I don't know how I can write it in a better way.
geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
CodePudding user response:
It's because when you name variables in the aes()
wrapper in ggplot()
, it is expected that those variables are available to any data set that you happen to call in the additional geom
s. If you want to use multiple data sets and they don't necessarily have the same variables, you need to have a separate aes()
wrapper in each of the geom
s to better control this issue.
ggplot()
geom_point(data = heartData, aes(x = biking, y = heart.disease))
geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"]), color = "red", size = 1.25)
geom_ribbon(data = plotting.data, aes(x = biking, ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)