Home > Software engineering >  Plotting confidence intervals of predicted values on partial regression plot
Plotting confidence intervals of predicted values on partial regression plot

Time:01-18

I am trying to make a partial regression plot of a linear model, on which the y-axis shows the actual values vs x-axis but the displayed trend line is from the predict() function. As it is, my plot is showing jagged confidence intervals. How can I fix this? My model is looking at the effect of a metric of sleep disturbance on gene expression, controlling for age, sex, and education.

This is what I've done:

fit1<-lm(formula = (DF[, "gene.expression"]) ~ DF[, "sleep.metric"]   
           DF[, "age_death"]   DF[, "msex"]   DF[, "educ"])
pred<-fit1[["terms"]][[3]][[2]][[2]][[2]][[4]]
outcome<-fit1[["terms"]][[2]][[2]][[4]]
covars_list<-list(c('age_death', 'msex', 'educ'))
    n<-length(fit1[["residuals"]])
    b1slope<-coef(fit1)[2] #beta of sleep
    x_var<-eval(parse(text=paste('DF$',pred,'[!is.na(DF$',pred,')]'))#values of sleep
    b0intercept<-coef(fit1)['(Intercept)']
    resid<-fit1$resid
    y.fit<-fit1$fitted.values
    
    #add back residuals to fitted vals to get actual y
    y.adj<-b1slope*x b0intercept   resid #gene expression

Since my data is scaled and centered, I make a new data frame containing values of sleep.metric and 0's for all other variables

covars_list<-list(c('age_death', 'msex', 'educ'))    
terms<-c(pred, outcome, unlist(covars_list))
txt<-paste0('c("',paste(terms, collapse='","'),'")')
nf<-paste0('DF[,',txt,']', sep="")
nf<-eval(parse(text=nf))
nf[,2:dim(nf)[2]]=0


 head(nf)
      sleep.metric gene.exp age_death msex educ
45   0.07818673                  0         0    0    0
81   0.13795502                  0         0    0    0
131 -0.34721989                  0         0    0    0
132 -0.74577821                  0         0    0    0
189 -0.13113761                  0         0    0    0
190  0.25137619                  0         0    0    0

Then I predict the model on the new data frame:

p<-predict(fit1, newdata=data.frame(nf), se.fit=F, interval='confidence',level = 0.95)
p<-na.omit(p)
x<-x[order(x)]
y.adj<-y.adj[order(x)]
p<-p[order(x),]
fit<-p[,1]
lower<-p[,2]
upper<-p[,3]


    plot(y.adj~x,
     pch=16, col='gray', cex=0.5,
     xlab="A metric of sleep distrubances \n adjusted for Gene expression, Age, Sex, and Education",
     ylab="Adjusted Normalized z-score of Differentially Expressed  Genes")
lines(fit~x,col="blue", lty=1)
lines(upper~x, col="lightblue", lty=2)
lines(lower~x, col="lightblue", lty=2)

This is what the plot looks like. What is causing the error and how can I fix it?

enter image description here

Note that you are effectively plotting the marginal effect of sleep.metric against the mean of the other covariates. In your case this results in a very similar plot to a straightforward geom_smooth(method = lm) of the two variables shown in your plot, so without any of the above code, you could just do:

ggplot(DF, aes(sleep.metric, gene.exp))  
  geom_point(alpha = 0.5)  
  geom_smooth(method = "lm", formula = y ~ x, fill = "lightblue",
              color = "darkblue", linetype = 2, alpha = 0.3, linewidth = 0.5)  
  theme_minimal(base_size = 16) 

enter image description here

CodePudding user response:

You could use the plot_cap() function from the marginaleffects package to do much of the heavy lifting for you. (Disclaimer: I am the maintainer.)

library(ggplot2)
library(marginaleffects)

# center
DF[] <- lapply(DF, scale)
DF[] <- lapply(DF, c)

# fit
fit1 <- lm(gene.exp ~ sleep.metric   age_death   msex   educ, data = DF)

# plot
plot_cap(fit1, condition = "sleep.metric")  
    geom_point(data = DF, aes(sleep.metric, gene.exp))  
    labs(x = "Sleep metric", y = "Gene expression")  
    theme_minimal()

  • Related