I have a data frame with the next columns (df1):
Codes | Oxy | Date |
---|---|---|
100095 | 30% | 1 |
100096 | 50% | 1 |
100097 | 20% | 1 |
100095 | 40% | 2 |
100096 | 10% | 2 |
100097 | 20% | 2 |
And another one with addional data that I want to merge with (df2):
Code1 | Code2 | Spp |
---|---|---|
100095 | 345550 | Sz |
104568 | 100096 | Cg |
983488 | 100097 | As |
This second df have additional information but not the same number of rows. As you can see the codes are similar but not always from the same column. Then, what I want to do is to merge both df but saying "Codes" = "Code1" or "Code2". If the codes are similar with one of the two column then the rest of the data on that row should be merge.
I have tried with:
df3<-merge(x = df1, y = df2, all.x = TRUE)
but it only copy all the extra data to each row without taking account the codes. Hope I explain my self and thank you so much.
I want the spp variable add to each row that have the same code between both df, but considering one of the two codes column from the df2.
CodePudding user response:
Try this, iteratively joining based on equalities:
library(dplyr)
df1 %>%
left_join(df2, by = c("Codes" = "Code1")) %>%
left_join(df2, by = c("Codes" = "Code2")) %>%
mutate(Spp = coalesce(Spp.x, Spp.y)) %>%
select(-Code1, -Code2, -Spp.x, -Spp.y)
# Codes Oxy Date Spp
# 1 100095 30% 1 Sz
# 2 100096 50% 1 Cg
# 3 100097 20% 1 As
# 4 100095 40% 2 Sz
# 5 100096 10% 2 Cg
# 6 100097 20% 2 As
Alternatively, we can do a single join after pivoting the data. This has the advantage that if you actually have more than 2 Code#
fields, this will work with "1 or more", not hard-coding 2:
tidyr::pivot_longer(df2, -Spp, values_to = "Codes") %>%
select(-name) %>%
left_join(df1, ., by = "Codes")
# Codes Oxy Date Spp
# 1 100095 30% 1 Sz
# 2 100096 50% 1 Cg
# 3 100097 20% 1 As
# 4 100095 40% 2 Sz
# 5 100096 10% 2 Cg
# 6 100097 20% 2 As
One possible gotcha with this: if there are more than one Spp
for a particular code, then this will cause duplicate rows, resulting in more than 6 rows (given this df1
). If that's a concern, then you may need to do something like pivot_longer(...) %>% distinct(Codes, .keep_all = TRUE) %>% left_join(...)
, or choose another reduction method such that nothing in this new Codes
column is duplicated.