Home > OS >  How to merge three dataframes to avoid NA values (R)?
How to merge three dataframes to avoid NA values (R)?

Time:03-20

I have a long dataset with responses at time 1 and time 2.

I am trying to join two other dataframes (df1 and df2) to my primary dataframe (df) to create df_join. I've tried the following code, but it results in four separate columns: var1.x, var2.x, var1.y, var2.y where I would like just var1 and var2

What is the best approach here?

library(dplyr)
df <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1, 2, 1, 2), var3 = c("324908", "1231", "23532", "09094")
df1 <- tibble(ID = c(1001, 1002), time = c(1, 1), var1 = c(300, 400), var2 = c(300, 400))
df2 <- tibble(ID = c(1001, 1002), time = c(2, 2), var1 = c(310, 410), var2 = c(320, 420))

df_join <- df %>% 
  left_join(select(df1, ID, var1, var2), by = c("ID")) %>% 
  left_join(select(df2, ID, var1, var2), by = c("ID"))

My desired output would be:

df_join_desire <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1, 2, 1, 2), var1 = c(300, 310, 400, 410), var2 = c(300, 320, 400, 420), var3 = c(324908, 1231, 23532, 09094))

CodePudding user response:

You're encountering this error because df1 and df2 have the same column names, so you don't need to left_join them together, you can simply bind them (i.e., putting the new rows of df2 at the bottom of df1).

df_join <- left_join(df, bind_rows(df1, df2))

CodePudding user response:

You need to find append df1 and df2 to each other.

df_join <- df |> left_join(bind_rows(df1, df2), by = c("ID", "time"))
  •  Tags:  
  • r
  • Related