Home > Mobile >  dplyr solution to join datasets
dplyr solution to join datasets

Time:02-22

I have two datasets I want to combine:

dm <- data.frame('m'=c(paste0('m', seq(1,7))), 'F1'=c('a','a','k','a','a', 'a', 'a'), 'F2'=c('b','b','b','b','h', 'b', 'b'), 'F3'=c('c','d', 'c', 'f', 'c', 'd', 'd'),
                 'val.m'=c(1,1.5,3,4,2.3, 4,0))

and

do <- data.frame('m'= rep('obs',5), 'F1'=c('a','a','a','a','k'), 'F2'=c('h','b','b','b','b'), 'F3'=c('c','f', 'c', 'd', 'c'),  'val.m'=c(3,7,11,14,9))
> dm
   m F1 F2 F3 val.m
1 m1  a  b  c   1.0
2 m2  a  b  d   1.5
3 m3  k  b  c   3.0
4 m4  a  b  f   4.0
5 m5  a  h  c   2.3
6 m6  a  b  d   4.0
7 m7  a  b  d   0.0


> do
    m F1 F2 F3 val.m
1 obs  a  h  c     3
2 obs  a  b  f     7
3 obs  a  b  c    11
4 obs  a  b  d    14
5 obs  k  b  c     9

I want to add a column val.o to dm that takes the values from do according to columns F1,F2,F3. The desired output should be:

dnew <- data.frame('m'=c(paste0('m', seq(1,7))), 'F1'=c('a','a','k','a','a','a','a'), 
                   'F2'=c('b','b','b','b','h', 'b','b'), 'F3'=c('c','d', 'c', 'f', 'c', 'd', 'd'),'val.m'=c(1,1.5,3,4,2.3, 4,0), 'val.o'=c(11,14,9,7,3, 14,14) )
> dnew
   m F1 F2 F3 val.m val.o
1 m1  a  b  c   1.0    11
2 m2  a  b  d   1.5    14
3 m3  k  b  c   3.0     9
4 m4  a  b  f   4.0     7
5 m5  a  h  c   2.3     3
6 m6  a  b  d   4.0    14
7 m7  a  b  d   0.0    14

A merge(do,dm, by=c('F1','F2','F3') gets close enough, but I would like to know how to achieve it with dplyr?

thanks for any suggestion

CodePudding user response:

Use left join with all.x = TRUE (by default it is a inner join)

merge(do[-1], dm, by = c("F1", "F2", "F3"), all.x = TRUE)

Or with dplyr

library(dplyr)
left_join(do %>% select(-m), dm , by = c("F1", "F2", "F3")) %>%
    arrange(m)
   F1 F2 F3 val.m.x  m val.m.y
1  a  b  c      11 m1     1.0
2  a  b  d      14 m2     1.5
3  k  b  c       9 m3     3.0
4  a  b  f       7 m4     4.0
5  a  h  c       3 m5     2.3
6  a  b  d      14 m6     4.0
7  a  b  d      14 m7     0.0
  • Related