Here is a small sample from two tables which I joined on gender:
df1 <- tibble(gender = c("M", "M"), ID = c(1,2))
df2 <- tibble(gender = c("M", "M", "M"), ID = c(30, 40, 50))
When I do a left_join, person 1 can match on either person 30, 40, 50 and so does person 2. I am looking for exactly one match for each person on the left. However if I group_by ID.x and slice the first option, then person 30 will be a match to both 1 and 2; and I need unique matches (i.e. each person on the left has one match on the right and vice versa). So in this case a better choice would be 1-30 and 2-40, e.g.
How can I achieve this efficiently with joins? The long way to be doing a loop and manually removing patient 30 after it was found as a match for person 1, and so on.
CodePudding user response:
You should need to create another variable to join by.
df1 <- df1 %>% group_by(gender) %>% mutate(ind = 1:n()) %>% ungroup()
df2 <- df2 %>% group_by(gender) %>% mutate(ind = 1:n()) %>% ungroup()
left_join(df1, df2, by = c("gender", "ind")) %>%
select(-ind)
# # A tibble: 2 × 3
# gender ID.x ID.y
# <chr> <dbl> <dbl>
# 1 M 1 30
# 2 M 2 40