How to index a dataframe for using ggplot in a loop-CodePudding

I am struggling with creating multiple ggplots using a loop. I use data in the following format:

a <- c(1,2,3,4)
b <- c(5,6,7,8)
c <- c(9,10,11,12)
d <- c(13,14,15,16)
time <- c(1,2,3,4)
data <- cbind(a,b,c,d,time)

What I want to create is a list of plots that plot one of the letters against the variable time. Which I tried in the following way:

library(ggplot2)
library(gridExtra)

plots <- list()
for (i in 1:4){
    plots[[i]] <- ggplot()   geom_line(data = data, aes(x = time, y = data[,i]))
}
grid.arrange(plots[[1]], plots[[2]], plots[[3]], plots[[4]])

This results in four times the fourth plot. How do I index this correctly in a way that creates the four intended plots?

CodePudding user response：

(Up front: the reason that your plots are all identical is due to ggplot's "lazy" evaluation of code. See my #2 below, where I identify that the data[,i] is evaluated when you try to plot the data, at which point i is 4, the last pass in the for loop.)

It's generally preferred/recommended to use data.frames instead of matrices or vectors (as you're doing here). It gives a bit more power and control.
```
data <- data.frame(a,b,c,d,time)
```
Also, I tend to prefer lapply to for-loops and lists, for various (some subjective) reasons. Ultimately, the issue you're having is that ggplot2 is evaluating the data lazily, so plots is a list with four plots that make reference to i ... and that is realized when you try to plot them all, at which point i is 4 (from the last pass through the loop). One benefit of using lapply is that the i referenced is a local-only (inside of the anon-func) version of i that is preserved as you would expect.
```
plots <- lapply(names(data)[1:4],
  function(nm) ggplot(data, aes(x = time, y = .data[[nm]]))   geom_line())
gridExtra::grid.arrange(plots[[1]], plots[[2]])
```
I also prefer patchwork to gridExtra, mostly because it makes more-customized layouts a bit more intuitive, plus adds functionality such as axis-alignment, shared legends, shared titles, etc. (None of those other features are demonstrated here.)
```
library(patchwork)
plots[[1]] / plots[[2]] # same plot
plots[[1]]   plots[[2]] # side-by-side instead of top/bottom
(plots[[1]]   plots[[2]]) / (plots[[3]]   plots[[4]]) # grid
```
Ultimately, though, I suggest that facets can be useful and very powerful. For this, we need to melt/pivot the data into a "long format" so that the column names a-b are actually in one column.
```
reshape2::melt(data, id.vars = "time") |>
  ggplot(aes(time, value))  
  geom_line()  
  facet_grid(variable ~ ., scales = "free_y")
```
I assumed the preference for independent (free) y-scales, ergo the scales="free_y". Try it without if you want to see the options. (There are also scales="free_x" and scales="free" (both).)

To see what I mean by "long" format:
```
reshape2::melt(data, id.vars = "time")
#    time variable value
# 1     1        a     1
# 2     2        a     2
# 3     3        a     3
# 4     4        a     4
# 5     1        b     5
# 6     2        b     6
# 7     3        b     7
# 8     4        b     8
# 9     1        c     9
# 10    2        c    10
# 11    3        c    11
# 12    4        c    12
# 13    1        d    13
# 14    2        d    14
# 15    3        d    15
# 16    4        d    16
```
This can also be done with tidyr::pivot_longer(data, -time), albeit the variable name is now name. For this use, there is no advantage to reshape2::melt or tidyr::pivot_longer; there are opportunities for significantly more complex pivoting in the latter, not relevant with this data.

Data

data <- structure(list(a = c(1, 2, 3, 4), b = c(5, 6, 7, 8), c = c(9, 10, 11, 12), d = c(13, 14, 15, 16), time = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA, -4L))