Home > Software engineering >  How do we transform a dataset in R using pivot_longer with multiple columns
How do we transform a dataset in R using pivot_longer with multiple columns

Time:03-04

I have reviewed many posts on SO (including Using pivot_longer with multiple paired columns in the wide dataset), but have not found a solution to what I need to do.

having initial dataset:

df <- tribble(
~person,  ~initial_event_date , ~type_initial, ~visit_prior, ~day_cnt_prior, ~prior_visit_type, ~visit_after, ~day_cnt_after, ~visit_after_type,
'a' , '01-01-2020', 'repair' ,'N', '', '', 'Y','15', 'follow-up',
'b' , '01-17-2020', 'routine' ,'Y', '-4', 'repair', 'N','', '',
'c' , '02-11-2020', 'consult' ,'Y', '-2', 'routine', 'Y','22', 'follow-up',
'd' , '04-01-2020', 'repair' ,'N', '', '', 'Y','12', 'correction'
)

I would like to output a dataframe similar to below because I intend to visualize the data on a time based plot using the timevis package.

output <- tribble(
  ~person, ~event_date, ~instance, ~type, ~day_cnt,
  'a', '01-01-2020', 'initial'    ,'repair'     ,'0',
  'a', ''          , 'visit_after', 'follow-up' , '15',
  'b', '01-17-2020', 'initial'    , 'routine'   ,'0',
  'b', ''          , 'visit_prior','repair'     ,'-4',
  'c', '02-11-2020', 'initial'    , 'consult'   ,'0',
  'c', ''          , 'visit_prior', 'routine'   , '-2',
  'c', ''          , 'visit_after', 'follow-up' ,'22',
  'd', '04-01-2020', 'initial'    ,'repair'     ,'0',
  'd', ''          , 'visit_after', 'correction','12'
)

I have tried multiple variations of pivot_longer such as :

df %>% pivot_longer(
  cols = c(type_initial,prior_visit_type, visit_after_type), 
  names_to = 'instance',
  values_to = 'day_cnt'
)

Any suggestions or other SO posts that may point me to the solution I am looking for?

CodePudding user response:

This will get you started

library(tidyr)

 pivot_longer(df, 
 cols=c(initial_event_date, type_initial,prior_visit_type, visit_after_type, day_cnt_after), 
                names_to = c('.value', 'instance'), 
                names_sep = "_")

Sample data:

df<-structure(list(person = c("a", "b", "c", "d"), initial_event_date = c("01-01-2020", 
"01-17-2020", "02-11-2020", "04-01-2020"), type_initial = c("repair", 
"routine", "consult", "repair"), visit_prior = c("N", "Y", "Y", 
"N"), day_cnt_prior = c("", "-4", "-2", ""), prior_visit_type = c("", 
"repair", "routine", ""), visit_after = c("Y", "N", "Y", "Y"), 
    day_cnt_after = c("15", "", "22", "12"), visit_after_type = c("follow-up", 
    "", "follow-up", "correction")), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Output:

person visit_prior day_cnt_prior visit_after instance initial    type    prior     visit        day  
   <chr>  <chr>       <chr>         <chr>       <chr>    <chr>      <chr>   <chr>     <chr>        <chr>
 1 a      N           ""            Y           event    01-01-2020 NA       NA        NA           NA  
 2 a      N           ""            Y           initial  NA         repair   NA        NA           NA  
 3 a      N           ""            Y           visit    NA         NA      ""         NA           NA  
 4 a      N           ""            Y           after    NA         NA       NA       "follow-up"   NA  
 5 a      N           ""            Y           cnt      NA         NA       NA        NA          "15" 
 6 b      Y           "-4"          N           event    01-17-2020 NA       NA        NA           NA  
 7 b      Y           "-4"          N           initial  NA         routine  NA        NA           NA  
 8 b      Y           "-4"          N           visit    NA         NA      "repair"   NA           NA  
 9 b      Y           "-4"          N           after    NA         NA       NA       ""            NA  
10 b      Y           "-4"          N           cnt      NA         NA       NA        NA          ""   
11 c      Y           "-2"          Y           event    02-11-2020 NA       NA        NA           NA  
12 c      Y           "-2"          Y           initial  NA         consult  NA        NA           NA  
13 c      Y           "-2"          Y           visit    NA         NA      "routine"  NA           NA  
14 c      Y           "-2"          Y           after    NA         NA       NA       "follow-up"   NA  
15 c      Y           "-2"          Y           cnt      NA         NA       NA        NA          "22" 
16 d      N           ""            Y           event    04-01-2020 NA       NA        NA           NA  
17 d      N           ""            Y           initial  NA         repair   NA        NA           NA  
18 d      N           ""            Y           visit    NA         NA      ""         NA           NA  
19 d      N           ""            Y           after    NA         NA       NA       "correction"  NA  
20 d      N           ""            Y           cnt      NA         NA       NA        NA          "12" 

CodePudding user response:

Probably not the most elegant solution, but I was able to solve my own problem using the steps below:

a <- df %>% 
  select(person,initial_event_date, type_initial) %>%
  mutate(visit_type = 'initial')
b <- df %>%
  filter(visit_prior == 'Y') %>%
  select(person, initial_event_date, prior_visit_type, day_cnt_prior) %>% 
  mutate(visit_type = 'visit_prior',
         day_cnt_prior = as.integer(day_cnt_prior))
c <- df %>% filter(visit_after == 'Y') %>%
  select(person, initial_event_date, visit_after_type, day_cnt_after) %>% 
  mutate(visit_type = 'visit_after',
         day_cnt_after = as.integer(day_cnt_after))

bind_rows(a,b,c) %>% 
  arrange(person) %>%
  mutate(visit_reason = dplyr::coalesce(type_initial, prior_visit_type, visit_after_type),
         visit_type   = dplyr::coalesce(visit_type),
         day_cnt      = dplyr::coalesce(day_cnt_after, day_cnt_prior)) %>% 
  select(person, initial_event_date,visit_type, visit_reason, day_cnt) %>% 
  replace_na(list(day_cnt = 0))
  
  • Related