How can I replace values in df1
with var values in df2
if they match BOTH code1 and code2, and append non-matches to the bottom keeping the order of df1 AND indexing whether the line was edited:
Sample data:
#create sample df1
df1 <- data.frame(code1=c("A","B","C","E","E","A"),code2=c("C3","C1","C4","C3","C5","C3"),var1=c(1,2,3,4,5,6),var2=c("vA","vB","vC","vD","vE","vF"),ver=1,stringsAsFactors = F)
#create sample df2
df2 <- data.frame(code1=c("E","B","Z","Z"),code2=c("C3","C1","C3","C5"),var1=c(9,10,1,5),var2=c("vX","vY","vD","vD"),ver=2,stringsAsFactors = F)
I'd like a function that creates the following result df:
df_desired <- data.frame(code1=c("A","B","C","E","E","A","Z","Z"),code2=c("C3","C1","C4","C3","C5","C3","C3","C5"),var1=c(1,10,3,9,5,6,1,5),var2=c("vA","vY","vC","vX","vE","vF","vD","vD"),ver=c(1,2,1,2,1,1,2,2),stringsAsFactors = F)
CodePudding user response:
Here's one approach. I'm sure you could make some of the steps a little more elegant, but it works.
f <- function(df1, df2) {
a <- match(df1$code1, df2$code1)
b <- match(df1$code2, df2$code2)
is_match <- a == b & !is.na(a) & !is.na(b)
where_match <- a[is_match]
df1[is_match,] <- df2[where_match,]
df1$edited <- is_match
bind_rows(df1, df2[1:nrow(df2) %in% where_match,])
}
f(df1, df2)
code1 code2 var1 var2 ver edited
1 A C3 1 vA 1 FALSE
2 B C1 10 vY 2 TRUE
3 C C4 3 vC 1 FALSE
4 E C3 9 vX 2 TRUE
5 E C5 5 vE 1 FALSE
6 A C3 6 vF 1 FALSE
7 E C3 9 vX 2 NA
8 B C1 10 vY 2 NA
CodePudding user response:
We could also use the (experimental) rows_upsert
from dplyr
:
library(dplyr)
df1 |>
rows_upsert(df2, by = c("code1", "code2"))
Output:
code1 code2 var1 var2 ver
1 A C3 1 vA 1
2 B C1 10 vY 2
3 C C4 3 vC 1
4 E C3 9 vX 2
5 E C5 5 vE 1
6 A C3 6 vF 1
7 Z C3 1 vD 2
8 Z C5 5 vD 2
I guess ver
is the indexing variable you need.