Plotting caterpillar plots for multiple dependent variables


I'm trying to make multiple caterpillar plots for different dependent variables but same independent variables by ggplot.

With this trial data,

x1 <- c(1,2,3,4,5)
x2 <- c(4,5,6,1,2)
x3 <- c(7,8,3,2,1)
x4 <- c(3,4,5,6,7)
y1 <- c(10,9,11,18,13)
y2 <- c(12,20,21,19,11)
y3 <- c(18,16,12,11,18)
df <- base::data.frame(x1, x2, x3, x4, y1, y2, y3)

at first I tried to make the plots without facet .

lm1 <- lm(data = df,
          y1 ~ x1   x2   x3   x4)

lm_plot1 <- lm %>%
 tidytext::tidy() %>%
 ggplot2::ggplot(aes(x = term, y = estimate)) 

# Make two more lm_plots for y2, y3 in the same way...
# And put them on one screen by grid.arrange()

gridExtra::grid.arrange(lm_plot1, lm_plot2, lm_plot3) 

Now I'd like to make faceted image. So I tried to make longer-data

df %>% 
  tidyr::pivot_longer(c(y1,y2,y3)) %>% 
  dplyr::rename(dv = name) %>% 

but I don't know how to estimate each model after group_by function and to make faceted plots by ggplot.

One approach is to nest() the data using tidyr.

First let's load libraries and make some better example data:


df1 <- matrix(sample(1:100, 350, replace = TRUE), 
              ncol = 7, 
              dimnames = list(NULL, c("x1", "x2", "x3", "x4", "y1", "y2", "y3")))
df1 <- as.data.frame(df1)

If we pivot_longer and nest on name, we get nested tibbles for each of the y-values. They contain a value column with the value for y1, y2, y3.

df1 %>% 
  pivot_longer(cols = starts_with("y")) %>% 
  nest(data = -name)

name  data             
  <chr> <list>           
1 y1    <tibble [50 × 5]>
2 y2    <tibble [50 × 5]>
3 y3    <tibble [50 × 5]>

Now we can use purrr::map to make a model for each y-value:

df1 %>% 
  pivot_longer(cols = starts_with("y")) %>% 
  nest(data = -name) %>% 
  mutate(model = map(data, ~lm(value ~ x1   x2   x3   x4, data = .)), 
         tidydata = map(model, tidy))

  name  data              model  tidydata        
  <chr> <list>            <list> <list>          
1 y1    <tibble [50 × 5]> <lm>   <tibble [5 × 5]>
2 y2    <tibble [50 × 5]> <lm>   <tibble [5 × 5]>
3 y3    <tibble [50 × 5]> <lm>   <tibble [5 × 5]>

Finally, remove unwanted columns and unnest the nested tidied output:

df1 %>% 
  pivot_longer(cols = starts_with("y")) %>% 
  nest(data = -name) %>% 
  mutate(model = map(data, ~lm(value ~ x1   x2   x3   x4, data = .)), 
         tidydata = map(model, tidy)) %>% 
  select(-data, -model) %>% 
  unnest(cols = "tidydata")

# A tibble: 15 × 6
   name  term        estimate std.error statistic    p.value
   <chr> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
 1 y1    (Intercept) 41.1        12.9      3.17   0.00272   
 2 y1    x1          -0.0379      0.134   -0.283  0.778     
 3 y1    x2           0.0889      0.142    0.628  0.533     
 4 y1    x3           0.0979      0.155    0.632  0.530     
 5 y1    x4           0.144       0.151    0.954  0.345     
 6 y2    (Intercept) 69.7        13.8      5.04   0.00000810
 7 y2    x1           0.00506     0.143    0.0354 0.972     
 8 y2    x2          -0.251       0.151   -1.66   0.105     
 9 y2    x3           0.00999     0.166    0.0603 0.952     
10 y2    x4          -0.278       0.161   -1.73   0.0910    
11 y3    (Intercept) 54.7        14.6      3.75   0.000503  
12 y3    x1           0.0645      0.150    0.428  0.670     
13 y3    x2          -0.0406      0.159   -0.255  0.800     
14 y3    x3          -0.0645      0.174   -0.370  0.713     
15 y3    x4          -0.0788      0.170   -0.465  0.644

Now we can ggplot with facets. I'm assuming that your "caterpillar plot" is similar to a "forest plot". So something like this:

df1 %>% 
  pivot_longer(cols = starts_with("y")) %>% 
  nest(data = -name) %>% 
  mutate(model = map(data, ~lm(value ~ x1   x2   x3   x4, data = .)), 
  tidydata = map(model, tidy)) %>% 
  select(-data, -model) %>% 
  unnest(cols = "tidydata") %>% 
  ggplot(aes(term, estimate))   
  geom_pointrange(aes(ymin = estimate - std.error, 
                      ymax = estimate   std.error))  


