Home > database >  Converting to wide with repeating observations
Converting to wide with repeating observations

Time:03-30

I am trying to reshape a data set from long to wide. I know many similar questions have been asked so I apologize if this is a repeat, I just can't seem to find exactly how to do what I need.

I have data like these. Basically there can be multiple observations for one ID and "type".

ID     type     start_date       end_date   

1       A        2021-08-15      2022-01-03
5       B        2016-07-21      2019-01-14
5       A        2017-09-02      2017-12-23
5       A        2014-05-01      2014-09-20 
9       B        2015-01-01      2017-11-13
9       A        2019-10-27      2020-02-09
11      B        2020-07-15      2020-12-07


And in the output I would like to get something like the following columns:

ID     type_A_start_date_1  type_A_end_date_1 type_A_start_date2  type_A_end_date_2  type_B_start_date_1 ...

I have tried "spread" in the reshape package, as well as "pivot_wider" but my issue is I don't know how to account for multiple of the same "type". Any help would be greatly appreciated!

CodePudding user response:

library(tidyverse)

df %>%
  group_by(type) %>%
  mutate(rn = row_number())%>%
  pivot_wider(ID,names_from = c(type, rn), 
              values_from = c(start_date, end_date),
              names_glue = 'type_{type}_{.value}_{rn}')

# A tibble: 4 x 15
     ID type_A_start_date_1 type_B_start_date_1 type_A_start_dat~ type_A_start_da~
  <int> <chr>               <chr>               <chr>             <chr>           
1     1 2021-08-15          NA                  NA                NA              
2     5 NA                  2016-07-21          2017-09-02        2014-05-01      
3     9 NA                  NA                  NA                NA              
4    11 NA                  NA                  NA                NA   

CodePudding user response:

We may use dcast

library(data.table)
dcast(setDT(df1), ID ~ type   rowid(type), 
       value.var = c("start_date", "end_date"))

-output

Key: <ID>
      ID start_date_A_1 start_date_A_2 start_date_A_3 start_date_A_4 start_date_B_1 start_date_B_2 start_date_B_3 end_date_A_1 end_date_A_2 end_date_A_3
   <int>         <char>         <char>         <char>         <char>         <char>         <char>         <char>       <char>       <char>       <char>
1:     1     2021-08-15           <NA>           <NA>           <NA>           <NA>           <NA>           <NA>   2022-01-03         <NA>         <NA>
2:     5           <NA>     2017-09-02     2014-05-01           <NA>     2016-07-21           <NA>           <NA>         <NA>   2017-12-23   2014-09-20
3:     9           <NA>           <NA>           <NA>     2019-10-27           <NA>     2015-01-01           <NA>         <NA>         <NA>         <NA>
4:    11           <NA>           <NA>           <NA>           <NA>           <NA>           <NA>     2020-07-15         <NA>         <NA>         <NA>
   end_date_A_4 end_date_B_1 end_date_B_2 end_date_B_3
         <char>       <char>       <char>       <char>
1:         <NA>         <NA>         <NA>         <NA>
2:         <NA>   2019-01-14         <NA>         <NA>
3:   2020-02-09         <NA>   2017-11-13         <NA>
4:         <NA>         <NA>         <NA>   2020-12-07
  •  Tags:  
  • r
  • Related