Home > Software engineering >  Insert value to df1 from df2 if df1 matches 2 columns in df2
Insert value to df1 from df2 if df1 matches 2 columns in df2

Time:05-30

I have a dataframe df1 like this:

ID1 <- c("AAA" , "CCD" , "AAA", "ABC", "AAA", "ABC", "ZZZ")
year1 <- c(2009,2008,2012,2012,2000,2012,2005)
df1 <- data.frame(ID1,year1)
df1

  ID1 year1
1 AAA  2009
2 CCD  2008
3 AAA  2012
4 ABC  2012
5 AAA  2000
6 ABC  2012
7 ZZZ  2005

Then, I have a dataframe df2 like this:

ID2 <- c("CCC","AAA","AAA","CCD","AAA","ABC","AAA","ABC","ABC","CCD")
year2 <- c(2005,2010,2012,2008,2009,2012,2000,2012,2004,2008)
xp <- c(5,6,7,8,5,4,3,4,3,8)
df2 <- data.frame(ID2,year2,xp)
df2

   ID2 year2 xp
1  CCC  2005  5
2  AAA  2010  6
3  AAA  2012  7
4  CCD  2008  8
5  AAA  2009  5
6  ABC  2012  4
7  AAA  2000  3
8  ABC  2012  4
9  ABC  2004  3
10 CCD  2008  8

I want to add the value of the column xp to df1 if df1$ID1 == df2$ID2 & df1$year1 == df2$year2. I tried:

df1$xp <- df2$xp[match(df1$ID1, df2$ID2) & match(df1$year1, df2$year2)] 

which won't work. The result should be:

  ID1 year1 xp
1 AAA  2009  5
2 CCD  2008  8
3 AAA  2012  7
4 ABC  2012  4
5 AAA  2000  3
6 ABC  2012  4
7 ZZZ  2005 NA

CodePudding user response:

A possible solution, based on dplyr::left_join:

library(dplyr)

left_join(df1, df2, by = c("ID1" = "ID2", "year1" = "year2")) %>% 
  distinct

#>   ID1 year1 xp
#> 1 AAA  2009  5
#> 2 CCD  2008  8
#> 3 AAA  2012  7
#> 4 ABC  2012  4
#> 5 AAA  2000  3
#> 6 ZZZ  2005 NA

Or using base R:

unique(merge(df1, df2, by.x = c("ID1", "year1"), by.y = c("ID2", "year2"), 
    all.x = T))

#>    ID1 year1 xp
#> 1  AAA  2000  3
#> 2  AAA  2009  5
#> 3  AAA  2012  7
#> 4  ABC  2012  4
#> 8  CCD  2008  8
#> 10 ZZZ  2005 NA

CodePudding user response:

Here is another possible solution using data.table:

library(data.table)

setDT(df1)[setDT(df2), on = c("ID1" = "ID2", "year1" = "year2"), xp := i.xp]

Output

   ID1 year1 xp
1: AAA  2009  5
2: CCD  2008  8
3: AAA  2012  7
4: ABC  2012  4
5: AAA  2000  3
6: ABC  2012  4
7: ZZZ  2005 NA

Or if you only want to keep the unique rows, as mentioned above, then you can use unique:

df1 <-
  unique(setDT(df1)[setDT(df2), on = c("ID1" = "ID2", "year1" = "year2"), xp := i.xp])

Output

   ID1 year1 xp
1: AAA  2009  5
2: CCD  2008  8
3: AAA  2012  7
4: ABC  2012  4
5: AAA  2000  3
6: ZZZ  2005 NA
  •  Tags:  
  • r
  • Related