Home > Software design >  ggplot() for multiple datasets on a linear regression
ggplot() for multiple datasets on a linear regression

Time:10-10

enter code hereThis is the dataset:

heartData <- structure(list(id = 1:6, biking = c(30.80124571, 65.12921517, 
1.959664531, 44.80019562, 69.42845368, 54.40362555), smoking = c(10.89660802, 
2.219563176, 17.58833051, 2.802558875, 15.9745046, 29.33317552
), heart.disease = c(11.76942278, 2.854081478, 17.17780348, 6.816646909, 
4.062223522, 9.550045997)), row.names = c(NA, 6L), class = "data.frame")

Here I have used multiple linear regression as model.

model.1 <- lm( heart.disease ~ biking   smoking, data = heartData)

plotting.data is a synthesized data I am interested in to check the confidence interval around as well as prediction interval.

plotting.data <- expand.grid(
  biking = seq(min(heartData$biking), max(heartData$biking), length.out = 5),
  smoking = c(mean(heartData$smoking)))

plotting.data$predicted.y <- predict(model.1, newdata = plotting.data, interval = 'confidence')
plotting.data$smoking <- round(plotting.data$smoking, digits = 2)
plotting.data$smoking <- as.factor(plotting.data$smoking)

After running the block of code above, I can see I have created plotting.data with 5 columns however, when I'm running

colnames(plotting.data)

I get 3 column names. plotting.data$predicted.y is only one column and I can't have access or rename plotting.data$predicted.y[,"fit"], plotting.data$predicted.y[,"upr"] or plotting.data$predicted.y[,"lwr"]

To plot results

heart.plot <- ggplot(data = heartData, aes(x = biking, y = heart.disease))   geom_point()
    geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"], color = "red"), size = 1.25)
    geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
heart.plot

I get the error:

Error in FUN(X[[i]], ...) : object 'heart.disease' not found

I don't know why I'm getting this error. From my own trial and errors, I know that the following part of the code is giving the error. however, I don't know how I can write it in a better way.

geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)

CodePudding user response:

It's because when you name variables in the aes() wrapper in ggplot(), it is expected that those variables are available to any data set that you happen to call in the additional geoms. If you want to use multiple data sets and they don't necessarily have the same variables, you need to have a separate aes() wrapper in each of the geoms to better control this issue.

ggplot()   
  geom_point(data = heartData, aes(x = biking, y = heart.disease))  
  geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"]), color = "red", size = 1.25)  
  geom_ribbon(data = plotting.data, aes(x = biking, ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)

enter image description here

  • Related