Having two dataframes with the same key id column:
dfnames1 <- data.frame(id = c(1,2,3,4), name1 = c("Helen", "Von", "Erik", "Brook", "Adel"), gender = c("F", "Neutral", "M", "Neutral", "F"))
dfnames2 <- data.frame(id = c(1,2,3,4), name2 = c("Helen", "Von", "Erik", "Brook", "Adel"), gender2 = c("Neutral", "M", "M", "Uni", "M"))
How is it possible to merge them into one data frame and for gender column check if it is "Neutral" label in one of the two dataframe and has another of the values of "F", "M" or "Uni" keep this label, if it is Neutral in both dataframes keep as it is and if the is a case of "F" and "M" or vice versus keep it as FM or MF.
Example of expected output:
dfnames <- data.frame(id = c(1,2,3,4), name = c("Helen", "Von", "Erik", "Brook", "Adel"), gender = c("F", "M", "M", "M", "FM"))
CodePudding user response:
Using dplyr we could it this way:
library(dplyr)
left_join(dfnames1, dfnames2) %>%
mutate(across(starts_with("gender"), ~ifelse(. == "Neutral", NA_character_, .)),
x = coalesce(gender, gender2),
x = ifelse(!is.na(gender2) & x != gender2, paste0(x, gender2), x)) %>%
select(id, name=name1, gender = x)
Joining, by = "id"
id name gender
1 1 Helen F
2 2 Von M
3 3 Erik M
4 4 Brook Uni
5 5 Adel FM
CodePudding user response:
Using the inputs shown in the Note at the end, create a state matrix (actually a data frame here) and use it to set gender
library(dplyr)
Lines <- " M F Neutral Uni
M M MF M M
F MF F F F
Neutral M F Neutral Uni
Uni M F Uni Uni"
State <- read.table(text = Lines, header = TRUE)
left_join(dfnames1, dfnames2, by = "id") %>%
rowwise %>%
mutate(gender = State[gender, gender2]) %>%
ungroup %>%
select(id, gender)
giving:
# A tibble: 5 × 2
id gender
<dbl> <chr>
1 1 F
2 2 M
3 3 M
4 4 Uni
5 5 MF
Note
dfnames1 <- structure(list(id = c(1, 2, 3, 4, 5), name1 = c("Helen", "Von",
"Erik", "Brook", "Adel"), gender = c("F", "Neutral", "M", "Neutral",
"F")), class = "data.frame", row.names = c(NA, -5L))
dfnames2 <- structure(list(id = c(1, 2, 3, 4, 5), name2 = c("Helen", "Von",
"Erik", "Brook", "Adel"), gender2 = c("Neutral", "M", "M", "Uni",
"M")), class = "data.frame", row.names = c(NA, -5L))