Home > database >  R Replace column values in dataframe base on matching indexing column in separate dataframe
R Replace column values in dataframe base on matching indexing column in separate dataframe

Time:12-14

I have a dataframe 'df1' that looks like:

Number Variable1 Variable Variable3
1 A B C
2 A B C
3 A B C
4 A B C
5 A B C

And I have a second dataframe 'df2' that looks like:

Number Variable1 Variable Variable3
1 D E F
2 G H I
3 J K L
4 M N O
15 P Q R

I want to update the three Variable columns in df1 with the data in the Variable columns in df2 based on matching values in Number so that df1 ends up looking like:

Number Variable1 Variable Variable3
1 D E F
2 G H I
3 J K L
4 M N O
5 A B C

CodePudding user response:

You could use a power_left_join from powerjoin package with conflict = coalesce_yx like this:

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce_yx)
#>   Number Variable1 Variable Variable3
#> 1      1         D        E         F
#> 2      2         G        H         I
#> 3      3         J        K         L
#> 4      4         M        N         O
#> 5      5         A        B         C

Created on 2022-12-13 with reprex v2.0.2


Data:

df1 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   A   B   C
2   A   B   C
3   A   B   C
4   A   B   C
5   A   B   C
', header = TRUE)

df2 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   D   E   F
2   G   H   I
3   J   K   L
4   M   N   O
15  P   Q   R
', header = TRUE)

CodePudding user response:

Would be helpful if dput(df) done. Have created another dataset for replication

df1<-cbind.data.frame(id=c(1:5),var1=rep("A",5),var2=rep("B",5),var3=rep("C",5))

df2<-cbind.data.frame(id=c(1:4,15),var1=LETTERS[7:11],var2=LETTERS[12:16],var3=LETTERS[16:20])


df1 %>%
  left_join(df2, by = "id") %>%
  mutate(var1 = coalesce(var1.y, var1.x),
         var2 = coalesce(var2.y, var2.x),
         var3 = coalesce(var3.y, var3.x)) %>%
  select(-var1.y, -var1.x,
         -var2.y, -var2.x,
         -var3.y, -var3.x)

CodePudding user response:

With dplyr, we can use rows_update

library(dplyr)
rows_update(df1, df2, by = 'Number', unmatched = "ignore")

-output

  Number Variable1 Variable Variable3
1      1         D        E         F
2      2         G        H         I
3      3         J        K         L
4      4         M        N         O
5      5         A        B         C

CodePudding user response:

You could update df1 while joining using data.table package and fcoalesce function:

library(data.table)

cols = c("Variable1", "Variable", "Variable3")

setDT(df1)[df2, (cols) := Map(fcoalesce, mget(paste0("i.", cols)), mget(cols)), on="Number"]
    

   Number Variable1 Variable Variable3
    <int>    <char>   <char>    <char>
1:      1         D        E         F
2:      2         G        H         I
3:      3         J        K         L
4:      4         M        N         O
5:      5         A        B         C  
  • Related