This should be easy to answer but I could not find this question. Let's say we have two data frames df1 and df2.
> df1 <- data.frame(a=c(1,2,3,4,5),
b=c(11,22,33,44,55))
> df1
a b
1 1 11
2 2 22
3 3 33
4 4 44
5 5 55
and
> df2 <- data.frame(a=c(1,1,1,2,2,3,4,4,4,5,5),
c=c("abc","abc","abc","def","def","ghi","jkl","jkl","jkl","mno","mno"),
d=c("x","y","z","x","y","x","x","y","z","x","y"))
> df2
a c d
1 1 abc x
2 1 abc y
3 1 abc z
4 2 def x
5 2 def y
6 3 ghi x
7 4 jkl x
8 4 jkl y
9 4 jkl z
10 5 mno x
11 5 mno y
I would like to merge both data frames so that only column 'c' is added to df1. Like this:
> df3 <- data.frame(a=c(1,2,3,4,5),
b=c(11,22,33,44,55),
c=c("abc","def","ghi","jkl","mno"))
> df3
a b c
1 1 11 abc
2 2 22 def
3 3 33 ghi
4 4 44 jkl
5 5 55 mno
I already tried to use the merge function and a left join. In this case I get this result:
> df5 <- merge(x = df1, y = df2[ , c("a", "c")], by = "a", all.x = TRUE)
> df5
a b c
1 1 11 abc
2 1 11 abc
3 1 11 abc
4 2 22 def
5 2 22 def
6 3 33 ghi
7 4 44 jkl
8 4 44 jkl
9 4 44 jkl
10 5 55 mno
11 5 55 mno
Thanks a lot for your help!
CodePudding user response:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- data.frame(a = c(1, 2, 3, 4, 5), b = c(11, 22, 33, 44, 55))
df2 <-
data.frame(
a = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 5, 5),
c = c(
"abc",
"abc",
"abc",
"def",
"def",
"ghi",
"jkl",
"jkl",
"jkl",
"mno",
"mno"
),
d = c("x", "y", "z", "x", "y", "x", "x", "y", "z", "x", "y")
)
df1 |> inner_join(df2, by = "a") |>
select(- d) |>
distinct()
#> a b c
#> 1 1 11 abc
#> 2 2 22 def
#> 3 3 33 ghi
#> 4 4 44 jkl
#> 5 5 55 mno
Created on 2022-04-20 by the reprex package (v2.0.1)
CodePudding user response:
One way to get this would be (using the same df1
and df2
as provided)
require(tidyverse)
df3 <- unique((inner_join(df1, select(df2, c("a","c")), by = c("a"))))
I used the inner join originally but left_join would work as well
Another way of doing would be to create a subset of df2
df2b <- df2 %>%
select(a,c) %>%
unique()
df3b <- left_join(df1, df2b, by="a")
CodePudding user response:
I think you can use merge
like below
> merge(df1,unique(df2[c("a","c")]))
a b c
1 1 11 abc
2 2 22 def
3 3 33 ghi
4 4 44 jkl
5 5 55 mno