I have two dataframes as such:
DF1<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
VAR2 = c('John','Bob','Hannah'), VAR3 = c(1,1,1,2),
VAR4=NA)
DF2<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
VAR2 = c('John','Bob','Hannah','Dave'),
VAR4 = c('A','B','C'))
I want to copy DF2$VAR4 into DF1$VAR4, for each row where the condition (DF1$VAR1==DF2$VAR1)&(DF1$VAR2==DF2$VAR2) is met, hence regardless of the value DF1$VAR3 takes. DF2$VAR4 only has one element for each pair (VAR1, VAR2), while DF1 has not. In fact, DF1$VAR3 counts the number of occurrences of each pair (DF1$VAR1, DF1$VAR2).
I tried (from another topic)
DF3<-merge(DF1, DF2[,'VAR4'], by= c('VAR1','VAR2'))
which should do the trick, but I get: "Error in fix.by(by.y,y): 'by' must specify a uniquely valid column"
I checked multiple times but there are no misspells, and both VAR1 and VAR2 exist in both dataframes, with consistent heading. I don't know what I'm missing here.
I am VERY new to R syntax, so I solved it with the nested loop:
for (i in DF1$VAR1){
for (j in DF1$VAR2[DF1$VAR1==i]){
DF1$VAR4[DF1$VAR1==i & DF1$VAR2==j] <- DF2$VAR4[DF2$VAR1==i & DF2$VAR2==j]
}
}
Which works but is excruciatingly slow. I'm pretty sure the solution is trivial, but I just can't work it out. Thanks a lot for your time
CodePudding user response:
We could use rows_update
(experimental) from dplyr
:
library(dplyr)
DF1 |>
rows_update(DF2, by = c("VAR1", "VAR2"), unmatched = "ignore")
Output:
VAR1 VAR2 VAR3 VAR4
1 1 John 1 A
2 1 Bob 1 B
3 1 Hannah 1 C
4 1 John 2 A
5 2 Bob 1 C
6 2 Hannah 1 A
7 2 John 1 B
8 2 Bob 2 C
9 3 Hannah 1 B
10 3 John 1 C
11 3 Bob 1 A
12 3 Hannah 2 B
CodePudding user response:
you can use left_join :
library(dplyr)
DF1<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
VAR2 = c('John','Bob','Hannah'), VAR3 = c(1,1,1,2),
VAR4=NA)
DF2<-data.frame(VAR1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
VAR2 = c('John','Bob','Hannah','Dave'),
VAR4 = c('A','B','C'))
DF3<-left_join(DF1, DF2, by= c('VAR1','VAR2'))%>%
mutate(VAR4=VAR4.y)%>%
select(-c(VAR4.x,VAR4.y))
DF3
# > DF3
# VAR1 VAR2 VAR3 VAR4
# 1 1 John 1 A
# 2 1 Bob 1 B
# 3 1 Hannah 1 C
# 4 1 John 2 A
# 5 2 Bob 1 C
# 6 2 Hannah 1 A
# 7 2 John 1 B
# 8 2 Bob 2 C
# 9 3 Hannah 1 B
# 10 3 John 1 C
# 11 3 Bob 1 A
# 12 3 Hannah 2 B