I am trying to reshape a data set from long to wide. I know many similar questions have been asked so I apologize if this is a repeat, I just can't seem to find exactly how to do what I need.
I have data like these. Basically there can be multiple observations for one ID and "type".
ID type start_date end_date
1 A 2021-08-15 2022-01-03
5 B 2016-07-21 2019-01-14
5 A 2017-09-02 2017-12-23
5 A 2014-05-01 2014-09-20
9 B 2015-01-01 2017-11-13
9 A 2019-10-27 2020-02-09
11 B 2020-07-15 2020-12-07
And in the output I would like to get something like the following columns:
ID type_A_start_date_1 type_A_end_date_1 type_A_start_date2 type_A_end_date_2 type_B_start_date_1 ...
I have tried "spread" in the reshape package, as well as "pivot_wider" but my issue is I don't know how to account for multiple of the same "type". Any help would be greatly appreciated!
CodePudding user response:
library(tidyverse)
df %>%
group_by(type) %>%
mutate(rn = row_number())%>%
pivot_wider(ID,names_from = c(type, rn),
values_from = c(start_date, end_date),
names_glue = 'type_{type}_{.value}_{rn}')
# A tibble: 4 x 15
ID type_A_start_date_1 type_B_start_date_1 type_A_start_dat~ type_A_start_da~
<int> <chr> <chr> <chr> <chr>
1 1 2021-08-15 NA NA NA
2 5 NA 2016-07-21 2017-09-02 2014-05-01
3 9 NA NA NA NA
4 11 NA NA NA NA
CodePudding user response:
We may use dcast
library(data.table)
dcast(setDT(df1), ID ~ type rowid(type),
value.var = c("start_date", "end_date"))
-output
Key: <ID>
ID start_date_A_1 start_date_A_2 start_date_A_3 start_date_A_4 start_date_B_1 start_date_B_2 start_date_B_3 end_date_A_1 end_date_A_2 end_date_A_3
<int> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char>
1: 1 2021-08-15 <NA> <NA> <NA> <NA> <NA> <NA> 2022-01-03 <NA> <NA>
2: 5 <NA> 2017-09-02 2014-05-01 <NA> 2016-07-21 <NA> <NA> <NA> 2017-12-23 2014-09-20
3: 9 <NA> <NA> <NA> 2019-10-27 <NA> 2015-01-01 <NA> <NA> <NA> <NA>
4: 11 <NA> <NA> <NA> <NA> <NA> <NA> 2020-07-15 <NA> <NA> <NA>
end_date_A_4 end_date_B_1 end_date_B_2 end_date_B_3
<char> <char> <char> <char>
1: <NA> <NA> <NA> <NA>
2: <NA> 2019-01-14 <NA> <NA>
3: 2020-02-09 <NA> 2017-11-13 <NA>
4: <NA> <NA> <NA> 2020-12-07