Home > Software engineering >  R-code: Assign variable with duplicates to a existing dataframe
R-code: Assign variable with duplicates to a existing dataframe

Time:01-18

I try to analyze medical data. I have Patients with certain lab values (data frame 1) and want to match to them their medication (data frame 2). My problem is: A single patient received sometimes multiple medications, for example, Patient 1 received medications "A" and "B". I need this variable for grouping my statistical analysis. How can I assign/merge values to a variable from a list with duplicates (same Patient-IDs)?

Example input data:

df_1 <- data.frame(ID =  c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
                   Labvalues = c(54, 86, 21, 56, 75, 96, 45, 78, 95))

df_2 <- data.frame(ID =  c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 3), 
                   Meds = c("A", "B", "C", "D", "E", "F", "A", "C", "G", "B", "G"))

Hopefully output data frame:

df_3 <- data.frame(ID =  c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
                   Labvalues = c(54, 86, 21, 56, 75, 96, 45, 78, 95), 
                   Meds = c("A;B", "B", "C;G", "D", "E", "F", "A", "C", "G"))

Thank you so much for your help!

I tried assign() function and different joint functions, but it always result in the creation of more rows. But I need one row per patient because I want to study the effect of the medication to the lab values, so the combination of drugs is relevant for me.

CodePudding user response:

Using a left_join and a group_by summarise you could do:

library(dplyr)

df_1 |> 
  left_join(df_2, by = "ID") |> 
  group_by(ID) |> 
  summarise(across(everything(), paste, collapse = ";"))
#> # A tibble: 9 × 3
#>      ID Labvalues Meds 
#>   <dbl> <chr>     <chr>
#> 1     1 54;54     A;B  
#> 2     2 86        B    
#> 3     3 21;21     C;G  
#> 4     4 56        D    
#> 5     5 75        E    
#> 6     6 96        F    
#> 7     7 45        A    
#> 8     8 78        C    
#> 9     9 95        G
  •  Tags:  
  • r
  • Related