Merge two data frames by one column with unique values-CodePudding

This should be easy to answer but I could not find this question. Let's say we have two data frames df1 and df2.

> df1 <- data.frame(a=c(1,2,3,4,5),
                    b=c(11,22,33,44,55))
> df1
  a  b
1 1 11
2 2 22
3 3 33
4 4 44
5 5 55

and

> df2 <- data.frame(a=c(1,1,1,2,2,3,4,4,4,5,5),
                    c=c("abc","abc","abc","def","def","ghi","jkl","jkl","jkl","mno","mno"),
                    d=c("x","y","z","x","y","x","x","y","z","x","y"))
> df2
   a   c d
1  1 abc x
2  1 abc y
3  1 abc z
4  2 def x
5  2 def y
6  3 ghi x
7  4 jkl x
8  4 jkl y
9  4 jkl z
10 5 mno x
11 5 mno y

I would like to merge both data frames so that only column 'c' is added to df1. Like this:

> df3 <- data.frame(a=c(1,2,3,4,5),
                    b=c(11,22,33,44,55),
                    c=c("abc","def","ghi","jkl","mno"))
> df3
  a  b   c
1 1 11 abc
2 2 22 def
3 3 33 ghi
4 4 44 jkl
5 5 55 mno

I already tried to use the merge function and a left join. In this case I get this result:

> df5 <- merge(x = df1, y = df2[ , c("a", "c")], by = "a", all.x = TRUE)
> df5
   a  b   c
1  1 11 abc
2  1 11 abc
3  1 11 abc
4  2 22 def
5  2 22 def
6  3 33 ghi
7  4 44 jkl
8  4 44 jkl
9  4 44 jkl
10 5 55 mno
11 5 55 mno

Thanks a lot for your help!

CodePudding user response：

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- data.frame(a = c(1, 2, 3, 4, 5), b = c(11, 22, 33, 44, 55))
df2 <-
  data.frame(
    a = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 5, 5),
    c = c(
      "abc",
      "abc",
      "abc",
      "def",
      "def",
      "ghi",
      "jkl",
      "jkl",
      "jkl",
      "mno",
      "mno"
    ),
    d = c("x", "y", "z", "x", "y", "x", "x", "y", "z", "x", "y")
  )

df1 |> inner_join(df2, by = "a") |> 
  select(- d) |> 
  distinct()
#>   a  b   c
#> 1 1 11 abc
#> 2 2 22 def
#> 3 3 33 ghi
#> 4 4 44 jkl
#> 5 5 55 mno

^{Created on 2022-04-20 by the reprex package (v2.0.1)}

CodePudding user response：

One way to get this would be (using the same df1 and df2 as provided)

require(tidyverse)
df3 <- unique((inner_join(df1, select(df2, c("a","c")), by = c("a"))))

I used the inner join originally but left_join would work as well

Another way of doing would be to create a subset of df2

df2b <- df2 %>% 
  select(a,c) %>%
  unique()
df3b <- left_join(df1, df2b, by="a")

CodePudding user response：

I think you can use merge like below

> merge(df1,unique(df2[c("a","c")]))
  a  b   c
1 1 11 abc
2 2 22 def
3 3 33 ghi
4 4 44 jkl
5 5 55 mno