suppose we have two data frame
df1=data.frame(col1=c("a","c","d"),
col2=c("m","e","d")
)
> df1
col1 col2
1 a m
2 c e
3 d d
df2=c(coll1=c("m","f","d"),
coll2=c(2,4,5)
)
> df2
coll1 coll2
1 m 2
2 f 4
3 d 5
is there direct way to left join df1 and df2 based on either if col1 or col2 of df1 match to value coll1 of df2. (without going through the left join twice).
desired result:
#first row : between 'a' and 'm' from df1, 'a' match to coll1 of df2
#second row : between 'c' and 'e' from df1, no value is match
#third row : the two value are match
df3
col1 col2 output
a m 2
c e NA
d d 5
thanks in advance!
CodePudding user response:
We may use match
with coalesce
library(dplyr)
df1 %>%
mutate(output = coalesce(df2$coll2[match(col1, df2$coll1)],
df2$coll2[match(col2, df2$coll1)]))
-output
col1 col2 output
1 a m 2
2 c e NA
3 d d 5
If we use dbplyr
, can also make use of sql_on
library(dbplyr)
left_join(tbl_memdb(df1), tbl_memdb(df2),
sql_on = "LHS.col1 = RHS.coll1 OR LHS.col2 = RHS.coll1") %>%
select(col1, col2, output = coll2) %>%
collect()
-output
# A tibble: 3 × 3
col1 col2 output
<chr> <chr> <dbl>
1 a m 2
2 c e NA
3 d d 5