Home > Enterprise >  how to use multiple left_joins without leftover NAS
how to use multiple left_joins without leftover NAS

Time:11-14

I am working with a handful of dataframes whose information I would like to combine.

df1 <- data.frame(id = c(1, 1, 2, 2), year = c(1990, 2000, 1990, 2000))
df1990 <- data.frame(id = c(1, 2), year = c(1990, 1990), X= c(1, 2))
df2000 <- data.frame(id = c(1, 2), year = c(2000, 2000), X= c(1, 2))

Above is code for creating toy inputs. I want to append the information in df2 and df3 to df1, resulting in a dataframe like this

df <- data.frame(id = c(1, 1, 2, 2), year = c(1990, 2000, 1990, 2000), X = c(1, 1, 2, 2))

To do this, my first thought was to use left_join() but I can only do this successfully once -- it works with the first attempted merge, but the NAs remain NAs when I try to do a second merge. So I run:

df <- left_join(df1, df1990)
df <- left_join(df, df2000)

But I still have NAs. Any idea how I can fix this?

CodePudding user response:

Use rows_update:

library(dplyr)
left_join(df1, df3) %>% 
  rows_update(df2)

output

Joining, by = "X"
Matching, by = "X"
  X X0 X2
1 1  1  1
2 2  2  2

CodePudding user response:

As suggested by TarJae in comments, create the right-hand side of the join, then join only once.

Assuming your dfs are all named dfYEAR, you can pull them out of the workspace

year_dfs <- lapply(grep("df[0-9]{4}", ls(), value = TRUE), get)

(even better would be to load them into a list to begin with)

Then join the two tables

# base-r
merge(df1, do.call(rbind, year_dfs))

# dplyr
year_dfs |> bind_rows() |> right_join(df1)

Note that if you have non-unique combinations of id and year in the year_dfs, you will end up with more rows than df1 started with.

  • Related