I'm trying to merge two dataframes by a common column, only matching rows which meet a certain criteria in another column:
df1 <- data.frame("id" = c("1001","1002","1003", "1004", "1005"),
"var1" = c(1,2,1,2,2))
df2 <- data.frame("id" = c("1001","1002","1003", "1004", "1005"),
"var2" = c(121,334,565,77,12))
I want to merge df2 into df1 based on id when var1=1, returning NA for non matches. So it should look like:
df3 <- data.frame("id" = c("1001","1002","1003", "1004", "1005"),
"var1" = c(1,2,1,2,2),
"var2" = c(121,NA,565,NA,NA))
I can't find much guidance on this online
Thanks!
CodePudding user response:
Do a join and replace
library(dplyr)
left_join(df1, df2, by = 'id') %>%
mutate(var2 = replace(var2, var1 == 2, NA))
-output
id var1 var2
1 1001 1 121
2 1002 2 NA
3 1003 1 565
4 1004 2 NA
5 1005 2 NA
Or using base R
df2$var2[df2$id %in% df1$id[df1$var1 == 2]] <- NA
merge(df1, df2)
id var1 var2
1 1001 1 121
2 1002 2 NA
3 1003 1 565
4 1004 2 NA
5 1005 2 NA
CodePudding user response:
Here is an alternative approach:
library(dplyr)
merge(df1, df2, by = "id") %>%
mutate(var2 = ifelse(var1==1, var2, NA_real_))
id var1 var2
1 1001 1 121
2 1002 2 NA
3 1003 1 565
4 1004 2 NA
5 1005 2 NA
CodePudding user response:
My first thought is to use the base function merge and then store NA values where var1 doesn't meet you criterion, like follows:
df3 <- merge(df1,df2,by = "id")
df3$var2[df3$var1 != 1] <- NA
If your actual data is more complicated, this may not scale up as well.