Home > database >  Merge two rows in columns based on condition
Merge two rows in columns based on condition

Time:07-18

I am new here, as well as to R, and I couldn't find any past queries that answered my following question, so I apologise if this has already been brought up before.

I am trying to merge the ID columns from two different datasets into one, but some of the IDs in the rows have been coded differently. I need to replace all the "LNZ_" IDs with the "LNZ.", however, I cannot figure out how I would go about doing this.

df_1 <- data.frame(ID_1 = c("LNZ_00001", "LNZ_00002", "LNZ_00003", "DFG00001", "CWD00001"),
                   Sex=c("M","F","F","M","F"))

df_2 <- data.frame(ID_2 = c("LNZ.00001", "LNZ.00002", "LNZ_00003", "DFG00001", "CWD00001"),
                   Type=c("S","S","B","B","B"), 
                   AGE=c(56,75,66,64,64))

The above is similar to the datasets that I have, only more scaled down. I hope this is somewhat clear, and any help would be appreciated.

Thanks!

CodePudding user response:

The issue with merging is that your ID columns have different formatting for some of the entries which are supposed to be matched. Therefore you need to modify those values to match before performing the merge. In the examples you gave, the difference is between a period separator (.) and an underscore (_). If your real data has more complex issues, you may need to use different code to clean up those values.

However, once that is resolved, you can perform your merge easily. Here I've used the {tidyverse} packages to accomplish both steps in one pipe chain.

library(tidyverse)

df_1 <- data.frame(ID_1 = c("LNZ_00001", "LNZ_00002", "LNZ_00003", "DFG00001", "CWD00001"), Sex=c("M","F","F","M","F"))

df_2 <- data.frame(ID_2 = c("LNZ.00001", "LNZ.00002", "LNZ_00003", "DFG00001", "CWD00001"), Type=c("S","S","B","B","B"), AGE=c(56,75,66,64,64))

df_2 %>% 
  mutate(ID_2 = str_replace(ID_2, "\\.", "\\_")) %>% 
  left_join(df_1, by = c("ID_2" = "ID_1"))
#>        ID_2 Type AGE Sex
#> 1 LNZ_00001    S  56   M
#> 2 LNZ_00002    S  75   F
#> 3 LNZ_00003    B  66   F
#> 4  DFG00001    B  64   M
#> 5  CWD00001    B  64   F

Created on 2022-07-17 by the reprex package (v2.0.1)

  • Related