Good evening, I am trying to merge a couple datasets and my normal tools in R are failing me tonight. Consider df1 and df2 below.
df1 = data.frame(a = c("a", "b", "c"),
b = c("1", "2", "3"),
c = c("x", "y", "z"))
df2 = data.frame(a = c("1", "b", "c", "d", "e"),
b = c("a", "2", "3", "4", "5"),
d = c("x2", "y2", "z2", "x3", "y3"))
In both cases, column a and b are supposed to act as grouping variables. For example, in df1, when a = a and b = 1, then c = x. Given the structure of the data I am working with, the actual order of a and b does not matter such that if a were to = 1 and b = a, then c will still equal x. Herein lies the problem, I would like to merge df1 with a new df, df2. df2, is similarly structured, but contains a new variable d. And, as can be seen df2 includes some a and b combinations that are backwards compared to A. In addition, B has some additional observations.
The desired dataframe I am looking for looks like this:
desired = data.frame(a = c("a", "b", "c"),
b = c("1", "2", "3"),
c = c("x", "y", "z"),
d = c("x2", "y2", "z2"))
As can be seen the original column structure from a b and c are preserved, and we have added in column D. However, we have not added any new observations.
I have tried using merge()
with varying combinations of by.x
, by.y
.
I also tried using various left_join
and inner_join
but I keep on getting whaping data sets that still aren't handling the mismatch in the a/b columns.
Thanks for any thoughts or help you might be able to provide.
Cheers
CodePudding user response:
You can left_join
df2 with df1 twice and use coalesce
-
library(dplyr)
df1 %>%
left_join(df2, by = c("a"="a", "b"="b")) %>%
left_join(df2, by = c("a"="b", "b"="a")) %>%
mutate(
d = coalesce(d.x, d.y)
) %>%
select(a,b,c,d)
a b c d
1 a 1 x x2
2 b 2 y y2
3 c 3 z z2
CodePudding user response:
Good morning. It appears, the actual order of a and b does matter. sort
your df2
, or maybe both.
df2[1:2] <- t(apply(df2[1:2], 1, sort, decreasing=TRUE))
merge(df1, df2)
# a b c d
# 1 a 1 x x2
# 2 b 2 y y2
# 3 c 3 z z2