Home > Software engineering >  Can you left join in R (all.x = TRUE) and keep non-matching cases in their original positions, rathe
Can you left join in R (all.x = TRUE) and keep non-matching cases in their original positions, rathe

Time:03-26

Is it possible to left join/merge in R while preserving the original ordering of the by column in the left dataframe? The default behavior is that non-matching cases are appended to the end of the merged data frame, but this obviously changes the ordering.

Is there a simple solution I'm missing, or a clever way around this?

Here is an example with a non-matching case for id == 3, in the second row of the left dataframe df1:

df1 = data.frame(id = c(4,3,2,1),
                 color = c('purple','red','blue','green'))
df2 = data.frame(id = c(1,2,4),
                 weight = c(4.1, 5.3, 1.8))
df1
#   id   color
# 1  4  purple
# 2  3     red
# 3  2    blue
# 4  1   green

df2
#   id weight
# 1  1    4.1
# 2  2    5.3
# 3  4    1.8

If you merge with the default sort = TRUE, then the non-matching case appears in its place in the sorted by column. In this case, the third row:

merge(df1, df2, by = 'id', all.x = TRUE)
#   id  color weight
# 1  1  green    4.1
# 2  2   blue    5.3
# 3  3    red     NA
# 4  4 purple    1.8

If you want to preserve ordering with sort = FALSE, then non-matching cases are appended to the bottom of the merged dataframe, as it clearly states in the documentation for merge:

merge(df1, df2, by = 'id', all.x = TRUE, sort = FALSE)
#   id  color weight
# 1  4 purple    1.8
# 2  2   blue    5.3
# 3  1  green    4.1
# 4  3    red     NA

What I'd like is to preserve the original ordering of the left df1, including the non-matching cases, rather than preserve the ordering only for matching cases. My goal is to produce this:

#   id  color weight
# 1  4 purple    1.8
# 2  3    red     NA
# 3  2   blue    5.3
# 4  1  green    4.1

Any help is much appreciated!

CodePudding user response:

library(dplyr)
df1 = data.frame(id = c(4,3,2,1),
                 color = c('purple','red','blue','green'))
df2 = data.frame(id = c(1,2,4),
                 weight = c(4.1, 5.3, 1.8))

df1 %>%
  left_join(df2, 
            by = "id")

Result:

  id  color weight
1  4 purple    1.8
2  3    red     NA
3  2   blue    5.3
4  1  green    4.1
  •  Tags:  
  • r
  • Related