How to make a ggplot timeseries plot from r dataframe using multiple columns?-CodePudding

I have this dataframe which I would like to to make an overlapping timeseries plot from, by ID, using four relevant data points for each ID. I don't know how to have all these show on top of each other. I want the lines to be divided into two colors, green for those which finish at a lower value than they began with and red for the opposite.

I don't know if I need to change my dataframe first or if I can do all the work with ggplot..

Here's a sample of my data (which has 38 total rows) All columns after ID should be points on the x axis and the Y should show from 0 to 3

      ID Starting_BAI HYP1_PostSession_1 HYP1_PostSession_4 HYP1_PostSession_8
 1     1            2                   1                   1                   0
 2     2            3                   3                   2                   1
 3     3            2                   2                   0                   0
 4     4            3                   3                   2                   0
 5     5            3                   2                   2                   1
 6     6            2                   2                   1                   1
 7     7            2                   2                   1                   0
 8     8            3                   3                   2                   1
 9     9            3                   3                   2                   0
10    10            3                   3                   1                   0

CodePudding user response：

I might rename your baseline value to HYP1_Session_0 or something like that, so that the final number at the end indicates "time" and would be consistent with the other column names.

It will help to have your data into long form instead of wide (and tidy). This means each time observation for a particular ID is one row.

Using pivot_longer, your data can look like this in long form:

library(tidyverse)
library(ggplot2)

df %>%
  rename(HYP1_Session_0 = Starting_BAI) %>%
  pivot_longer(cols = -ID, names_to = "time", values_to = "value", names_pattern = "(\\d )$", names_transform = list(time = as.numeric))

# A tibble: 40 × 3
      ID  time value
   <int> <dbl> <int>
 1     1     0     2
 2     1     1     1
 3     1     4     1
 4     1     8     0
 5     2     0     3
 6     2     1     3
 7     2     4     2
 8     2     8     1
 9     3     0     2
10     3     1     2
# … with 30 more rows

After that, using tidyverse you can group_by your ID and add an indicator column change that will compare the values at time 0 and 8 (for coloring).

Finally, you can use ggplot to plot your spaghetti plot. Here we can indicate color based on change. In addition, position_jitter prevents lines from completely overlapping each other.

Note: I changed the data so that one of the IDs (ID #6) would result in a different color (red) - just for testing.

library(tidyverse)
library(ggplot2)

df %>%
  rename(HYP1_Session_0 = Starting_BAI) %>%
  pivot_longer(cols = -ID, names_to = "time", values_to = "value", names_pattern = "(\\d )$", names_transform = list(time = as.numeric)) %>%
  group_by(ID) %>%
  mutate(change = value[time == 8] < value[time == 0]) %>%
  ggplot(aes(x = time, y = value, group = ID, color = change))  
    geom_line(position=position_jitter(width = 0, height = .1))  
    scale_x_continuous(breaks = c(0, 1, 4, 8))  
    scale_color_manual(values = c("red", "green"))

Output

Data

df <- structure(list(ID = 1:10, Starting_BAI = c(2L, 3L, 2L, 3L, 3L, 
2L, 2L, 3L, 3L, 3L), HYP1_PostSession_1 = c(1L, 3L, 2L, 3L, 2L, 
2L, 2L, 3L, 3L, 3L), HYP1_PostSession_4 = c(1L, 2L, 0L, 2L, 2L, 
1L, 1L, 2L, 2L, 1L), HYP1_PostSession_8 = c(0L, 1L, 0L, 0L, 1L, 
3L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))