Insert value from df2 into matching df1 cell if exists, append df2 rows if not exists in df1, and in-CodePudding

How can I replace values in df1 with var values in df2 if they match BOTH code1 and code2, and append non-matches to the bottom keeping the order of df1 AND indexing whether the line was edited:

Sample data:

#create sample df1
df1 <- data.frame(code1=c("A","B","C","E","E","A"),code2=c("C3","C1","C4","C3","C5","C3"),var1=c(1,2,3,4,5,6),var2=c("vA","vB","vC","vD","vE","vF"),ver=1,stringsAsFactors = F)

#create sample df2
df2 <- data.frame(code1=c("E","B","Z","Z"),code2=c("C3","C1","C3","C5"),var1=c(9,10,1,5),var2=c("vX","vY","vD","vD"),ver=2,stringsAsFactors = F)

I'd like a function that creates the following result df:


df_desired <- data.frame(code1=c("A","B","C","E","E","A","Z","Z"),code2=c("C3","C1","C4","C3","C5","C3","C3","C5"),var1=c(1,10,3,9,5,6,1,5),var2=c("vA","vY","vC","vX","vE","vF","vD","vD"),ver=c(1,2,1,2,1,1,2,2),stringsAsFactors = F)

CodePudding user response：

Here's one approach. I'm sure you could make some of the steps a little more elegant, but it works.

f <- function(df1, df2) {
  a <- match(df1$code1, df2$code1)
  b <- match(df1$code2, df2$code2)
  is_match <- a == b & !is.na(a) & !is.na(b)
  where_match <- a[is_match]
  
  df1[is_match,] <- df2[where_match,]
  df1$edited <- is_match
  
  bind_rows(df1, df2[1:nrow(df2) %in% where_match,])
}

f(df1, df2)

  code1 code2 var1 var2 ver edited
1     A    C3    1   vA   1  FALSE
2     B    C1   10   vY   2   TRUE
3     C    C4    3   vC   1  FALSE
4     E    C3    9   vX   2   TRUE
5     E    C5    5   vE   1  FALSE
6     A    C3    6   vF   1  FALSE
7     E    C3    9   vX   2     NA
8     B    C1   10   vY   2     NA

CodePudding user response：

We could also use the (experimental) rows_upsert from dplyr:

library(dplyr)

df1 |>
  rows_upsert(df2, by = c("code1", "code2"))

Output:

  code1 code2 var1 var2 ver
1     A    C3    1   vA   1
2     B    C1   10   vY   2
3     C    C4    3   vC   1
4     E    C3    9   vX   2
5     E    C5    5   vE   1
6     A    C3    6   vF   1
7     Z    C3    1   vD   2
8     Z    C5    5   vD   2

I guess ver is the indexing variable you need.