I am working with a handful of dataframes whose information I would like to combine.
df1 <- data.frame(id = c(1, 1, 2, 2), year = c(1990, 2000, 1990, 2000))
df1990 <- data.frame(id = c(1, 2), year = c(1990, 1990), X= c(1, 2))
df2000 <- data.frame(id = c(1, 2), year = c(2000, 2000), X= c(1, 2))
Above is code for creating toy inputs. I want to append the information in df2 and df3 to df1, resulting in a dataframe like this
df <- data.frame(id = c(1, 1, 2, 2), year = c(1990, 2000, 1990, 2000), X = c(1, 1, 2, 2))
To do this, my first thought was to use left_join()
but I can only do this successfully once -- it works with the first attempted merge, but the NAs remain NAs when I try to do a second merge.
So I run:
df <- left_join(df1, df1990)
df <- left_join(df, df2000)
But I still have NAs. Any idea how I can fix this?
CodePudding user response:
Use rows_update
:
library(dplyr)
left_join(df1, df3) %>%
rows_update(df2)
output
Joining, by = "X"
Matching, by = "X"
X X0 X2
1 1 1 1
2 2 2 2
CodePudding user response:
As suggested by TarJae in comments, create the right-hand side of the join, then join only once.
Assuming your dfs are all named dfYEAR
, you can pull them out of the workspace
year_dfs <- lapply(grep("df[0-9]{4}", ls(), value = TRUE), get)
(even better would be to load them into a list to begin with)
Then join the two tables
# base-r
merge(df1, do.call(rbind, year_dfs))
# dplyr
year_dfs |> bind_rows() |> right_join(df1)
Note that if you have non-unique combinations of id and year in the year_dfs
, you will end up with more rows than df1
started with.