I'm working with time series data in wide(ish) format for many columns that all have the format "T1_v1_slow"..."T4_v1_slow" and "T1_v1_fast"..."T4_v1_fast". There are 42 variables with this format in total. I'm trying to use pivot_longer
to return each variable as a column (e.g., "v1_total", "v1_Vig), with a single column for time. Example data below:
ParticipantNumber | T1_v1_total | T2_v1_total | T3_v1_total | T4_v1_total | T1_v1_Vig | T2_v1_Vig | T3_v1_Vig | T4_v1_Vig |
---|---|---|---|---|---|---|---|---|
1 | 41 | 47 | 45 | 28 | 8 | 5 | 8 | 11 |
2 | 16 | 25 | -4 | -4 | 11 | 10 | 18 | 17 |
I can get the data in truly long format using
data %>%
pivot_longer(-ParticipantNumber) %>%
separate(name, into = c("time", "name"), sep = 3)
Output
ParticipantNumber | time | name | value |
---|---|---|---|
1 | T1_ | v1_total | 41 |
1 | T2_ | v1_total | 47 |
1 | T3_ | v1_total | 45 |
1 | T4_ | v1_total | 28 |
1 | T1_ | v1_Vig | 8 |
1 | T2_ | v1_Vig | 5 |
1 | T3_ | v1_Vig | 8 |
1 | T4_ | v1_Vig | 11 |
2 | T1_ | v1_total | 16 |
2 | T2_ | v1_total | 25 |
2 | T3_ | v1_total | -4 |
2 | T4_ | v1_total | -4 |
2 | T1_ | v1_Vig | 11 |
2 | T2_ | v1_Vig | 10 |
2 | T3_ | v1_Vig | 18 |
2 | T4_ | v1_Vig | 17 |
I can't use sep = "_"
because I don't want to grab the second _
. My desired output is:
ParticipantNumber | time | v1_Vig | v1_total |
---|---|---|---|
1 | T1_ | 8 | 41 |
1 | T2_ | 5 | 47 |
1 | T3_ | 8 | 45 |
1 | T4_ | 5 | 28 |
2 | T1_ | 11 | 16 |
2 | T2_ | 10 | 25 |
2 | T3_ | 18 | -4 |
2 | T4_ | 17 | -4 |
CodePudding user response:
Simply use pivot_wider
data %>%
pivot_longer(-ParticipantNumber) %>%
separate(name, into = c("time", "name"), sep = 3) %>%
pivot_wider(values_fn=sum)
CodePudding user response:
data.frame(
ParticipantNumber = c(1L, 2L),
T1_v1_total = c(41L, 16L),
T2_v1_total = c(47L, 25L),
T3_v1_total = c(45L, -4L),
T4_v1_total = c(28L, -4L),
T1_v1_Vig = c(8L, 11L),
T2_v1_Vig = c(5L, 10L),
T3_v1_Vig = c(8L, 18L),
T4_v1_Vig = c(11L, 17L)
) %>%
pivot_longer(-ParticipantNumber, names_to = c("time", ".value"), names_pattern = "(.)_(.*)")
## A tibble: 8 × 4
# ParticipantNumber time v1_total v1_Vig
# <int> <chr> <int> <int>
#1 1 1 41 8
#2 1 2 47 5
#3 1 3 45 8
#4 1 4 28 11
#5 2 1 16 11
#6 2 2 25 10
#7 2 3 -4 18
#8 2 4 -4 17