I try to analyze medical data. I have Patients with certain lab values (data frame 1) and want to match to them their medication (data frame 2). My problem is: A single patient received sometimes multiple medications, for example, Patient 1 received medications "A" and "B". I need this variable for grouping my statistical analysis. How can I assign/merge values to a variable from a list with duplicates (same Patient-IDs)?
Example input data:
df_1 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Labvalues = c(54, 86, 21, 56, 75, 96, 45, 78, 95))
df_2 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 3),
Meds = c("A", "B", "C", "D", "E", "F", "A", "C", "G", "B", "G"))
Hopefully output data frame:
df_3 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Labvalues = c(54, 86, 21, 56, 75, 96, 45, 78, 95),
Meds = c("A;B", "B", "C;G", "D", "E", "F", "A", "C", "G"))
Thank you so much for your help!
I tried assign() function and different joint functions, but it always result in the creation of more rows. But I need one row per patient because I want to study the effect of the medication to the lab values, so the combination of drugs is relevant for me.
CodePudding user response:
Using a left_join
and a group_by
summarise
you could do:
library(dplyr)
df_1 |>
left_join(df_2, by = "ID") |>
group_by(ID) |>
summarise(across(everything(), paste, collapse = ";"))
#> # A tibble: 9 × 3
#> ID Labvalues Meds
#> <dbl> <chr> <chr>
#> 1 1 54;54 A;B
#> 2 2 86 B
#> 3 3 21;21 C;G
#> 4 4 56 D
#> 5 5 75 E
#> 6 6 96 F
#> 7 7 45 A
#> 8 8 78 C
#> 9 9 95 G