Home > Net >  Is there any command to merge dataframes with different values in a specific column?
Is there any command to merge dataframes with different values in a specific column?

Time:11-22

Having two dataframes with the same key id column:

dfnames1 <- data.frame(id = c(1,2,3,4), name1 = c("Helen", "Von", "Erik", "Brook", "Adel"), gender = c("F", "Neutral", "M", "Neutral", "F"))
dfnames2 <- data.frame(id = c(1,2,3,4), name2 = c("Helen", "Von", "Erik", "Brook", "Adel"), gender2 = c("Neutral", "M", "M", "Uni", "M"))

How is it possible to merge them into one data frame and for gender column check if it is "Neutral" label in one of the two dataframe and has another of the values of "F", "M" or "Uni" keep this label, if it is Neutral in both dataframes keep as it is and if the is a case of "F" and "M" or vice versus keep it as FM or MF.

Example of expected output:

dfnames <- data.frame(id = c(1,2,3,4), name = c("Helen", "Von", "Erik", "Brook", "Adel"), gender = c("F", "M", "M", "M", "FM"))

CodePudding user response:

Using dplyr we could it this way:

library(dplyr)

left_join(dfnames1, dfnames2) %>% 
  mutate(across(starts_with("gender"), ~ifelse(. == "Neutral", NA_character_, .)),
         x = coalesce(gender, gender2),
         x = ifelse(!is.na(gender2) & x != gender2, paste0(x, gender2), x)) %>% 
  select(id, name=name1, gender = x)
Joining, by = "id"
  id  name gender
1  1 Helen      F
2  2   Von      M
3  3  Erik      M
4  4 Brook    Uni
5  5  Adel     FM

CodePudding user response:

Using the inputs shown in the Note at the end, create a state matrix (actually a data frame here) and use it to set gender

library(dplyr)

Lines <- "        M   F Neutral Uni  
          M       M  MF M       M  
          F       MF F  F       F  
          Neutral M  F  Neutral Uni
          Uni     M  F  Uni     Uni"
State <- read.table(text = Lines, header = TRUE)

left_join(dfnames1, dfnames2, by = "id") %>%
  rowwise %>%
  mutate(gender = State[gender, gender2]) %>%
  ungroup %>%
  select(id, gender)

giving:

# A tibble: 5 × 2
     id gender
  <dbl> <chr> 
1     1 F     
2     2 M     
3     3 M     
4     4 Uni   
5     5 MF    

Note

dfnames1 <- structure(list(id = c(1, 2, 3, 4, 5), name1 = c("Helen", "Von", 
"Erik", "Brook", "Adel"), gender = c("F", "Neutral", "M", "Neutral", 
"F")), class = "data.frame", row.names = c(NA, -5L))

dfnames2 <- structure(list(id = c(1, 2, 3, 4, 5), name2 = c("Helen", "Von", 
"Erik", "Brook", "Adel"), gender2 = c("Neutral", "M", "M", "Uni", 
"M")), class = "data.frame", row.names = c(NA, -5L))
  •  Tags:  
  • r
  • Related