I have two data frames similar to these:
df<-structure(list(study_id = c(1, 2, 3, 5, 8, 10), mrn = c(123456,
654321, 121212, 212121, 232323, 323232
), gender = c(1, 0, 0, 1, 1, 0), surg_date = structure(c(17003,
17519, 17610, 16800, 18083, 18003), class = "Date"), tobacco_use_at_time_of_sur = c(1,
0, 0, 1, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df2<-structure(list(mrn = c("123456", "654321", "654321",
"654321", "654321"), Procedures = c("Right Cranioplasty With Custom Made Implant",
"Right Mesh Cranioplasty Insertion", "Removal, Right Sided Cranioplasty, Washout, Complex Wound Closure",
"Revision Right-Sided Peek Cranioplasty\r\nRight Temporalis Muscle Resuspension; Reconstruction Of The Scalp With Scalp Flap 23 X 14 Cm",
"Removal Of Right Sided Cranioplasty Implant, Wound Washout\r\nScalp Flap Closure Of Right Cranial Wound 19 X 14.5"
), surg_date = structure(c(1529452800, 1569196800, 1571875200,
1610496000, 1613779200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Patient Age` = c("68 yrs", "63 yrs", "63 yrs", "63 yrs",
"63 yrs")), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
That I'm eventually trying to merge. "df" is explicitly 1 row per patient. "df2" is 'mostly' one row per patient but some patients are repeats and are found in multiple rows. I'm trying to merge them and I know there are going to be MANY steps, stuff like dropping columns I don't care about, making sure names of columns and classes line up, etc...
But my specific question today is, if I'm trying to make df2 explicitly one row per patient, and take the extra rows for each patient and convert them into extra columns, how can I do that?
I know pivot_wider would handle a lot of this, but I'm stuck on: Ideally the first row found for each patient in terms of earliest surg_date would stay in its original position.
Ideally my result would look like this:
One last little asterisk is that there may be several columns (besides just MRN) that I'd like to "leave alone".
CodePudding user response:
Not sure how this output is more informative, but you can try
library(tidyverse)
df2 %>%
group_by(mrn) %>%
mutate(n=1:n()) %>%
pivot_wider(
names_from = n,
names_glue = "{n}_{.value}",
values_from = c(Procedures, surg_date,`Patient Age`)
) %>%
mutate(mrn=as.numeric(mrn)) %>%
right_join(df,by = "mrn")
CodePudding user response:
Using tidyr::pivot_wider
, create first an id by group and then pivot.
library(dplyr)
library(tidyr)
df2 %>%
mutate(mrn = as.numeric(mrn)) %>%
right_join(select(df, -surg_date), ., by = "mrn") %>%
group_by(mrn) %>%
mutate(id_count = seq(n()) - 1) %>%
pivot_wider(names_from = id_count, values_from = c(Procedures, surg_date, `Patient Age`))
# A tibble: 2 x 16
# Groups: mrn [2]
study_id mrn gender tobacco_use_at_time_of_sur Procedures_0 Procedures_1 Procedures_2 Procedures_3 surg_date_0 surg_date_1
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dttm> <dttm>
1 1 123456 1 1 Right Cranio~ NA NA NA 2018-06-20 00:00:00 NA
2 2 654321 0 0 Right Mesh C~ Removal, Rig~ "Revision Rig~ "Removal Of ~ 2019-09-23 00:00:00 2019-10-24 00:00:00
# ... with 6 more variables: surg_date_2 <dttm>, surg_date_3 <dttm>, Patient Age_0 <chr>, Patient Age_1 <chr>, Patient Age_2 <chr>,
# Patient Age_3 <chr>