Concatenate dataframe with difrent dimension-CodePudding

I have data that are stored in three different data frames with different lengths. Below you can see my data :

df<-data.frame(  
  retail_seling_price=c(10),
  quantity_10=c(1000)
)

df


df1<-data.frame(  
  retail_seling_price=c(100,200,300),
  quantity_18=c(1000,2000,3000)
)

df1

df2<-data.frame(  
  retail_seling_price=c(100,200,300,400),
  quantity_18=c(1000,2000,3000,0)
)

df2

So now I want to merge all these data frames into one data frame, as in the pic shown below.

So can anybody help me how to solve this problem ?

CodePudding user response：

Is your df2 supposed to have quantity_20 instead of quantity_18? If so (fixed for below), this is a combination of full_join (for each pair of frames) and Reduce on a list of frames to do the same for an arbitrary number of frames.

library(dplyr)
Reduce(function(a, b) full_join(a, b, by = "retail_seling_price"),
       list(df, df1, df2)) %>%
  mutate(across(everything(), ~ coalesce(., 0)))
#   retail_seling_price quantity_10 quantity_18 quantity_20
# 1                  10        1000           0           0
# 2                 100           0        1000        1000
# 3                 200           0        2000        2000
# 4                 300           0        3000        3000
# 5                 400           0           0           0

Frankly, we don't need dplyr for this, it's handy to use for fixing all of the NA values introduced by the initial join:

Reduce(function(a, b) full_join(a, b, by = "retail_seling_price"),
       list(df, df1, df2))
#   retail_seling_price quantity_10 quantity_18 quantity_20
# 1                  10        1000          NA          NA
# 2                 100          NA        1000        1000
# 3                 200          NA        2000        2000
# 4                 300          NA        3000        3000
# 5                 400          NA          NA           0

CodePudding user response：

Here is a another dplyr approach using bind_rows:

library(dplyr)

bind_rows(df, df1, df2) %>% 
  group_by(retail_seling_price) %>% 
  summarize(across(everything(), ~sum(., na.rm = TRUE)))

 retail_seling_price quantity_10 quantity_18 quantity_20
                <dbl>       <dbl>       <dbl>       <dbl>
1                  10        1000           0           0
2                 100           0        1000        1000
3                 200           0        2000        2000
4                 300           0        3000        3000
5                 400           0           0           0