R convert long to wide but only for select rows-CodePudding

I have two data frames similar to these:

df<-structure(list(study_id = c(1, 2, 3, 5, 8, 10), mrn = c(123456, 
654321, 121212, 212121, 232323, 323232
), gender = c(1, 0, 0, 1, 1, 0), surg_date = structure(c(17003, 
17519, 17610, 16800, 18083, 18003), class = "Date"), tobacco_use_at_time_of_sur = c(1, 
0, 0, 1, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

df2<-structure(list(mrn = c("123456", "654321", "654321", 
"654321", "654321"), Procedures = c("Right Cranioplasty With Custom Made Implant", 
"Right Mesh Cranioplasty Insertion", "Removal, Right Sided Cranioplasty, Washout, Complex Wound Closure", 
"Revision Right-Sided Peek Cranioplasty\r\nRight Temporalis Muscle Resuspension; Reconstruction Of The Scalp With Scalp Flap 23 X 14 Cm", 
"Removal Of Right Sided Cranioplasty Implant, Wound Washout\r\nScalp Flap Closure Of Right Cranial Wound 19 X 14.5"
), surg_date = structure(c(1529452800, 1569196800, 1571875200, 
1610496000, 1613779200), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Patient Age` = c("68 yrs", "63 yrs", "63 yrs", "63 yrs", 
"63 yrs")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))

That I'm eventually trying to merge. "df" is explicitly 1 row per patient. "df2" is 'mostly' one row per patient but some patients are repeats and are found in multiple rows. I'm trying to merge them and I know there are going to be MANY steps, stuff like dropping columns I don't care about, making sure names of columns and classes line up, etc...

But my specific question today is, if I'm trying to make df2 explicitly one row per patient, and take the extra rows for each patient and convert them into extra columns, how can I do that?

I know pivot_wider would handle a lot of this, but I'm stuck on: Ideally the first row found for each patient in terms of earliest surg_date would stay in its original position.

Ideally my result would look like this:

One last little asterisk is that there may be several columns (besides just MRN) that I'd like to "leave alone".

CodePudding user response：

Not sure how this output is more informative, but you can try

library(tidyverse)
df2 %>% 
  group_by(mrn) %>% 
  mutate(n=1:n()) %>%
  pivot_wider(
    names_from = n,
    names_glue = "{n}_{.value}",
    values_from = c(Procedures, surg_date,`Patient Age`)
  ) %>% 
  mutate(mrn=as.numeric(mrn)) %>% 
  right_join(df,by = "mrn")

CodePudding user response：

Using tidyr::pivot_wider, create first an id by group and then pivot.

library(dplyr)
library(tidyr)
df2 %>% 
  mutate(mrn = as.numeric(mrn)) %>% 
  right_join(select(df, -surg_date), ., by = "mrn") %>% 
  group_by(mrn) %>% 
  mutate(id_count = seq(n()) - 1) %>%
  pivot_wider(names_from = id_count, values_from = c(Procedures, surg_date, `Patient Age`))

# A tibble: 2 x 16
# Groups:   mrn [2]
  study_id    mrn gender tobacco_use_at_time_of_sur Procedures_0  Procedures_1  Procedures_2   Procedures_3  surg_date_0         surg_date_1        
     <dbl>  <dbl>  <dbl>                      <dbl> <chr>         <chr>         <chr>          <chr>         <dttm>              <dttm>             
1        1 123456      1                          1 Right Cranio~ NA             NA             NA           2018-06-20 00:00:00 NA                 
2        2 654321      0                          0 Right Mesh C~ Removal, Rig~ "Revision Rig~ "Removal Of ~ 2019-09-23 00:00:00 2019-10-24 00:00:00
# ... with 6 more variables: surg_date_2 <dttm>, surg_date_3 <dttm>, Patient Age_0 <chr>, Patient Age_1 <chr>, Patient Age_2 <chr>,
#   Patient Age_3 <chr>