Home > database >  in R how to join 2 dataframe if one of the two column values in first dataframe match to one column
in R how to join 2 dataframe if one of the two column values in first dataframe match to one column

Time:01-02

suppose we have two data frame

df1=data.frame(col1=c("a","c","d"),
      col2=c("m","e","d")
      )


> df1
  col1 col2
1    a    m
2    c    e
3    d    d


df2=c(coll1=c("m","f","d"),
      coll2=c(2,4,5)
)


> df2
  coll1 coll2
1     m     2
2     f     4
3     d     5


is there direct way to left join df1 and df2 based on either if col1 or col2 of df1 match to value coll1 of df2. (without going through the left join twice).

desired result:

#first row : between 'a' and 'm' from df1, 'a' match to coll1 of df2
#second row : between 'c' and 'e' from df1, no value is match
#third row : the two value are match
df3
col1 col2 output
a    m      2
c    e     NA
d    d      5

thanks in advance!

CodePudding user response:

We may use match with coalesce

library(dplyr)
df1 %>% 
  mutate(output = coalesce(df2$coll2[match(col1, df2$coll1)], 
     df2$coll2[match(col2, df2$coll1)]))

-output

 col1 col2 output
1    a    m      2
2    c    e     NA
3    d    d      5

If we use dbplyr, can also make use of sql_on

library(dbplyr)
left_join(tbl_memdb(df1), tbl_memdb(df2), 
   sql_on = "LHS.col1 = RHS.coll1 OR LHS.col2 = RHS.coll1") %>% 
     select(col1, col2, output = coll2) %>% 
    collect()

-output

# A tibble: 3 × 3
  col1  col2  output
  <chr> <chr>  <dbl>
1 a     m          2
2 c     e         NA
3 d     d          5
  • Related