Home > Enterprise >  Copy value from one dataframe into another if multiple conditions are met (R)
Copy value from one dataframe into another if multiple conditions are met (R)

Time:12-01

I have two dataframes as such:

DF1<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
VAR2 = c('John','Bob','Hannah'), VAR3 = c(1,1,1,2),
VAR4=NA)

DF2<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3), 
VAR2 = c('John','Bob','Hannah','Dave'),
VAR4 = c('A','B','C'))

I want to copy DF2$VAR4 into DF1$VAR4, for each row where the condition (DF1$VAR1==DF2$VAR1)&(DF1$VAR2==DF2$VAR2) is met, hence regardless of the value DF1$VAR3 takes. DF2$VAR4 only has one element for each pair (VAR1, VAR2), while DF1 has not. In fact, DF1$VAR3 counts the number of occurrences of each pair (DF1$VAR1, DF1$VAR2).

I tried (from another topic)

DF3<-merge(DF1, DF2[,'VAR4'], by= c('VAR1','VAR2'))

which should do the trick, but I get: "Error in fix.by(by.y,y): 'by' must specify a uniquely valid column"

I checked multiple times but there are no misspells, and both VAR1 and VAR2 exist in both dataframes, with consistent heading. I don't know what I'm missing here.

I am VERY new to R syntax, so I solved it with the nested loop:

for (i in DF1$VAR1){
  for (j in DF1$VAR2[DF1$VAR1==i]){
    DF1$VAR4[DF1$VAR1==i & DF1$VAR2==j] <- DF2$VAR4[DF2$VAR1==i & DF2$VAR2==j]
  }
}

Which works but is excruciatingly slow. I'm pretty sure the solution is trivial, but I just can't work it out. Thanks a lot for your time

CodePudding user response:

We could use rows_update (experimental) from dplyr:

library(dplyr)

DF1 |>
  rows_update(DF2, by = c("VAR1", "VAR2"), unmatched = "ignore")

Output:

   VAR1   VAR2 VAR3 VAR4
1     1   John    1    A
2     1    Bob    1    B
3     1 Hannah    1    C
4     1   John    2    A
5     2    Bob    1    C
6     2 Hannah    1    A
7     2   John    1    B
8     2    Bob    2    C
9     3 Hannah    1    B
10    3   John    1    C
11    3    Bob    1    A
12    3 Hannah    2    B

CodePudding user response:

you can use left_join :

library(dplyr) 

DF1<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
                VAR2 = c('John','Bob','Hannah'), VAR3 = c(1,1,1,2),
                VAR4=NA)

DF2<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3), 
                VAR2 = c('John','Bob','Hannah','Dave'),
                VAR4 = c('A','B','C'))

DF3<-left_join(DF1, DF2, by= c('VAR1','VAR2'))%>%
   mutate(VAR4=VAR4.y)%>%
   select(-c(VAR4.x,VAR4.y))
DF3
# > DF3
# VAR1   VAR2 VAR3 VAR4
# 1     1   John    1    A
# 2     1    Bob    1    B
# 3     1 Hannah    1    C
# 4     1   John    2    A
# 5     2    Bob    1    C
# 6     2 Hannah    1    A
# 7     2   John    1    B
# 8     2    Bob    2    C
# 9     3 Hannah    1    B
# 10    3   John    1    C
# 11    3    Bob    1    A
# 12    3 Hannah    2    B
  • Related