How to merge two data.frame by "or"-CodePudding

I have a data frame with the next columns (df1):

Codes	Oxy	Date
100095	30%	1
100096	50%	1
100097	20%	1
100095	40%	2
100096	10%	2
100097	20%	2

And another one with addional data that I want to merge with (df2):

Code1	Code2	Spp
100095	345550	Sz
104568	100096	Cg
983488	100097	As

This second df have additional information but not the same number of rows. As you can see the codes are similar but not always from the same column. Then, what I want to do is to merge both df but saying "Codes" = "Code1" or "Code2". If the codes are similar with one of the two column then the rest of the data on that row should be merge.

I have tried with:

df3<-merge(x = df1, y = df2, all.x = TRUE)

but it only copy all the extra data to each row without taking account the codes. Hope I explain my self and thank you so much.

I want the spp variable add to each row that have the same code between both df, but considering one of the two codes column from the df2.

CodePudding user response：

Try this, iteratively joining based on equalities:

library(dplyr)
df1 %>%
  left_join(df2, by = c("Codes" = "Code1")) %>%
  left_join(df2, by = c("Codes" = "Code2")) %>%
  mutate(Spp = coalesce(Spp.x, Spp.y)) %>%
  select(-Code1, -Code2, -Spp.x, -Spp.y)
#    Codes Oxy Date Spp
# 1 100095 30%    1  Sz
# 2 100096 50%    1  Cg
# 3 100097 20%    1  As
# 4 100095 40%    2  Sz
# 5 100096 10%    2  Cg
# 6 100097 20%    2  As

Alternatively, we can do a single join after pivoting the data. This has the advantage that if you actually have more than 2 Code# fields, this will work with "1 or more", not hard-coding 2:

tidyr::pivot_longer(df2, -Spp, values_to = "Codes") %>%
  select(-name) %>%
  left_join(df1, ., by = "Codes")
#    Codes Oxy Date Spp
# 1 100095 30%    1  Sz
# 2 100096 50%    1  Cg
# 3 100097 20%    1  As
# 4 100095 40%    2  Sz
# 5 100096 10%    2  Cg
# 6 100097 20%    2  As

One possible gotcha with this: if there are more than one Spp for a particular code, then this will cause duplicate rows, resulting in more than 6 rows (given this df1). If that's a concern, then you may need to do something like pivot_longer(...) %>% distinct(Codes, .keep_all = TRUE) %>% left_join(...), or choose another reduction method such that nothing in this new Codes column is duplicated.