Add geom_smooth to ggplot facets conditionally based on p-value-CodePudding

I'm using ggplot to visualize many linear regressions and facet them by groups. I'd like geom_smooth() to show the trend line as one color if P < 0.05, a different color if P < 0.10, and not show it at all if P ≥ 0.10.

I managed to do this using a loop to extract P-values from lm() for each regression, then join them with the data used for plotting. Then I add another column of color names to pass to aes(), determined conditionally from the P-values, and use scale_color_identity() to achieve my goal.

Here's an example:

library(tidyverse)

#make mtcars a tibble and cyl a factor, for convenience
mtcars1 <- as_tibble(mtcars) %>% dplyr::mutate(cyl = as.factor(cyl))

#initialize a list to store p-values from lm() for each level of factor
p.list <- vector(mode = "list", length = length(levels(mtcars1$cyl)))
names(p.list) <- levels(mtcars1$cyl)

#loop to calculate p-values for each level of mtcars$cyl
for(i in seq_along(levels(mtcars1$cyl))){
  mtcars.sub <- mtcars1 %>% dplyr::filter(cyl == levels(.$cyl)[i])
  
  lm.pval <- mtcars.sub %>% 
    dplyr::distinct(cyl) %>% 
    dplyr::mutate(P = 
                    summary(lm(mpg ~ disp, data = mtcars.sub))$coefficients[2,4] ##extract P-value
    )
  
  p.list[[i]] <- lm.pval
}

#join p-values to dataset and add column to use with scale_color_identity()
mtcars.p <- mtcars1 %>% dplyr::left_join(dplyr::bind_rows(p.list, .id = "cyl"), by = "cyl") %>% 
  dplyr::mutate(p.color = ifelse(P < 0.05, "black",
                                 ifelse(P < 0.10, "lightblue", NA)))

#plot
ggplot(data = mtcars.p, aes(x = disp, y = mpg))  
  geom_smooth(method = "lm",
              se = FALSE,
              aes(color = p.color))  
  geom_point()  
  scale_color_identity(name = NULL,
                       na.translate = FALSE,
                       labels = c("P < 0.05", "P < 0.10"),
                       guide = "legend")  
  facet_wrap(~cyl, scales = "free")

This seems like too many initial steps for something that should be relatively easy. Are these steps necessary, or is there a more efficient way of doing this? Can ggplot or any other packages out there do this on their own, without having to first extract p-values from lm()?

CodePudding user response：

We may simplify the steps with a group by operation and also instead of extracting each component, the output can be in a tibble with tidy from broom

library(broom)
library(dplyr)
library(tidyr)
mtcars1 %>% 
   group_by(cyl) %>% 
   summarise(out = list(tidy(lm(mpg ~ disp, data = cur_data())))) %>% 
   unnest(out)

-output

# A tibble: 6 x 6
  cyl   term        estimate std.error statistic    p.value
  <fct> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
1 4     (Intercept) 40.9       3.59       11.4   0.00000120
2 4     disp        -0.135     0.0332     -4.07  0.00278   
3 6     (Intercept) 19.1       2.91        6.55  0.00124   
4 6     disp         0.00361   0.0156      0.232 0.826     
5 8     (Intercept) 22.0       3.35        6.59  0.0000259 
6 8     disp        -0.0196    0.00932    -2.11  0.0568

CodePudding user response：

After specifying your regression function, you can include the line function within ggplot:

myline<-lm(mpg ~ disp, data = mtcars)
ggplot(data = mtcars, aes(x = disp, y = mpg))  
  geom_abline(slope = coef(myline)[[2]], intercept = coef(myline)[[1]],     color='blue') 
  geom_point(color='red')  
  scale_color_identity(name = NULL,
                   na.translate = FALSE,
                   labels = c("P < 0.05", "P < 0.10"),
                   guide = "legend")  
  facet_wrap(~cyl, scales = "free")

The same as above, you can use this geom_smooth() command as well:

geom_smooth(slope = coef(myline)[[2]], intercept = coef(myline)[[1]], color='blue',se=F,method='lm')