I have a data frame with multiple variables that have similar endings ("0m","6m" or "12m"). These refer to the month of the data. Some of the variables are only collected once, whilst others at 3 time points ("0m","6m" or "12m"). The data looks like this:
wide= data.frame(id = c(1:5),
agree = c("y","n","n","y","y"),
test1_0m = c(2,3,4,3,5),
test1_6m = c(3,5,2,6,7),
test1_12m = c(6,7,8,4,5),
score_0m = c(55,44,33,22,11),
score_6m = c(77,66,55,44,33),
score_12m = c(99,88,77,66,55))
> wide
id agree test1_0m test1_6m test1_12m score_0m score_6m score_12m
1 1 y 2 3 6 55 77 99
2 2 n 3 5 7 44 66 88
3 3 n 4 2 8 33 55 77
4 4 y 3 6 4 22 44 66
5 5 y 5 7 5 11 33 55
I want to pivot_longer
to get a column Month
which has 0m
, 6m
, or 12m
as entries, plus columns called test1
and score
which have the result for the corresponding person and month.
I've found a really helpful answer here: https://stackoverflow.com/questions/69798752/pivot-longer-for-multiple-sets-having-the-same-names-to
But I don't know how to specifiy the correct regex
to get the values I want.
I've tried this, which is wrong:
wide%>%
pivot_longer(cols = contains("_"),
names_to = c("Month", ".value"),
names_pattern = "(.*\\_)(.*)",
values_drop_na = TRUE )
This is the output I want:
long
id agree Month test1 score
1 1 y 0m 2 55
2 2 n 0m 3 44
3 3 n 0m 4 33
4 4 y 0m 3 22
5 5 y 0m 5 11
6 1 y 6m 3 77
7 2 n 6m 5 66
8 3 n 6m 2 55
9 4 y 6m 6 44
10 5 y 6m 7 33
11 1 y 12m 6 99
12 2 n 12m 7 88
13 3 n 12m 8 77
14 4 y 12m 4 66
15 5 y 12m 5 55
CodePudding user response:
You can use .value
character here it
indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely
Then you also need to define 2 capturing groups with your regex:
wide %>%
pivot_longer(contains('_'),
names_to = c('.value', 'Month'),
names_pattern = '(\\w )_(\\w )')
# A tibble: 15 × 5
id agree Month test1 score
<int> <chr> <chr> <dbl> <dbl>
1 1 y 0m 2 55
2 1 y 6m 3 77
3 1 y 12m 6 99
4 2 n 0m 3 44
5 2 n 6m 5 66
6 2 n 12m 7 88
7 3 n 0m 4 33
8 3 n 6m 2 55
9 3 n 12m 8 77
10 4 y 0m 3 22
11 4 y 6m 6 44
12 4 y 12m 4 66
13 5 y 0m 5 11
14 5 y 6m 7 33
15 5 y 12m 5 55
CodePudding user response:
I think you can just use names_sep =
instead of names_pattern =
, and you need to switch the order of the names (".value", "Month").
wide %>%
pivot_longer(cols = contains("_"),
names_to = c(".value", "Month"),
names_sep = "_")
Result:
# A tibble: 15 × 5
id agree Month test1 score
<int> <chr> <chr> <dbl> <dbl>
1 1 y 0m 2 55
2 1 y 6m 3 77
3 1 y 12m 6 99
4 2 n 0m 3 44
5 2 n 6m 5 66
6 2 n 12m 7 88
7 3 n 0m 4 33
8 3 n 6m 2 55
9 3 n 12m 8 77
10 4 y 0m 3 22
11 4 y 6m 6 44
12 4 y 12m 4 66
13 5 y 0m 5 11
14 5 y 6m 7 33
15 5 y 12m 5 55