Is it possible to left join/merge in R while preserving the original ordering of the by
column in the left dataframe? The default behavior is that non-matching cases are appended to the end of the merged data frame, but this obviously changes the ordering.
Is there a simple solution I'm missing, or a clever way around this?
Here is an example with a non-matching case for id == 3
, in the second row of the left dataframe df1
:
df1 = data.frame(id = c(4,3,2,1),
color = c('purple','red','blue','green'))
df2 = data.frame(id = c(1,2,4),
weight = c(4.1, 5.3, 1.8))
df1
# id color
# 1 4 purple
# 2 3 red
# 3 2 blue
# 4 1 green
df2
# id weight
# 1 1 4.1
# 2 2 5.3
# 3 4 1.8
If you merge with the default sort = TRUE
, then the non-matching case appears in its place in the sorted by
column. In this case, the third row:
merge(df1, df2, by = 'id', all.x = TRUE)
# id color weight
# 1 1 green 4.1
# 2 2 blue 5.3
# 3 3 red NA
# 4 4 purple 1.8
If you want to preserve ordering with sort = FALSE
, then non-matching cases are appended to the bottom of the merged dataframe, as it clearly states in the documentation for merge
:
merge(df1, df2, by = 'id', all.x = TRUE, sort = FALSE)
# id color weight
# 1 4 purple 1.8
# 2 2 blue 5.3
# 3 1 green 4.1
# 4 3 red NA
What I'd like is to preserve the original ordering of the left df1
, including the non-matching cases, rather than preserve the ordering only for matching cases. My goal is to produce this:
# id color weight
# 1 4 purple 1.8
# 2 3 red NA
# 3 2 blue 5.3
# 4 1 green 4.1
Any help is much appreciated!
CodePudding user response:
library(dplyr)
df1 = data.frame(id = c(4,3,2,1),
color = c('purple','red','blue','green'))
df2 = data.frame(id = c(1,2,4),
weight = c(4.1, 5.3, 1.8))
df1 %>%
left_join(df2,
by = "id")
Result:
id color weight
1 4 purple 1.8
2 3 red NA
3 2 blue 5.3
4 1 green 4.1