Home > Back-end >  Retain unique records from a data set created with left_join
Retain unique records from a data set created with left_join

Time:07-04

Here is a small sample from two tables which I joined on gender:

df1 <- tibble(gender = c("M", "M"), ID = c(1,2))
df2 <- tibble(gender = c("M", "M", "M"), ID = c(30, 40, 50))

When I do a left_join, person 1 can match on either person 30, 40, 50 and so does person 2. I am looking for exactly one match for each person on the left. However if I group_by ID.x and slice the first option, then person 30 will be a match to both 1 and 2; and I need unique matches (i.e. each person on the left has one match on the right and vice versa). So in this case a better choice would be 1-30 and 2-40, e.g.

How can I achieve this efficiently with joins? The long way to be doing a loop and manually removing patient 30 after it was found as a match for person 1, and so on.

CodePudding user response:

You should need to create another variable to join by.

df1 <- df1 %>% group_by(gender) %>% mutate(ind = 1:n()) %>% ungroup()
df2 <- df2 %>% group_by(gender) %>% mutate(ind = 1:n()) %>% ungroup()

left_join(df1, df2, by = c("gender", "ind")) %>%
  select(-ind)

# # A tibble: 2 × 3
#   gender  ID.x  ID.y
#   <chr>  <dbl> <dbl>
# 1 M          1    30
# 2 M          2    40
  • Related