Home > other >  Concatenate dataframe with difrent dimension
Concatenate dataframe with difrent dimension

Time:01-24

I have data that are stored in three different data frames with different lengths. Below you can see my data :

df<-data.frame(  
  retail_seling_price=c(10),
  quantity_10=c(1000)
)

df


df1<-data.frame(  
  retail_seling_price=c(100,200,300),
  quantity_18=c(1000,2000,3000)
)

df1

df2<-data.frame(  
  retail_seling_price=c(100,200,300,400),
  quantity_18=c(1000,2000,3000,0)
)

df2

So now I want to merge all these data frames into one data frame, as in the pic shown below.

enter image description here

So can anybody help me how to solve this problem ?

CodePudding user response:

Is your df2 supposed to have quantity_20 instead of quantity_18? If so (fixed for below), this is a combination of full_join (for each pair of frames) and Reduce on a list of frames to do the same for an arbitrary number of frames.

library(dplyr)
Reduce(function(a, b) full_join(a, b, by = "retail_seling_price"),
       list(df, df1, df2)) %>%
  mutate(across(everything(), ~ coalesce(., 0)))
#   retail_seling_price quantity_10 quantity_18 quantity_20
# 1                  10        1000           0           0
# 2                 100           0        1000        1000
# 3                 200           0        2000        2000
# 4                 300           0        3000        3000
# 5                 400           0           0           0

Frankly, we don't need dplyr for this, it's handy to use for fixing all of the NA values introduced by the initial join:

Reduce(function(a, b) full_join(a, b, by = "retail_seling_price"),
       list(df, df1, df2))
#   retail_seling_price quantity_10 quantity_18 quantity_20
# 1                  10        1000          NA          NA
# 2                 100          NA        1000        1000
# 3                 200          NA        2000        2000
# 4                 300          NA        3000        3000
# 5                 400          NA          NA           0

CodePudding user response:

Here is a another dplyr approach using bind_rows:

library(dplyr)

bind_rows(df, df1, df2) %>% 
  group_by(retail_seling_price) %>% 
  summarize(across(everything(), ~sum(., na.rm = TRUE)))
 retail_seling_price quantity_10 quantity_18 quantity_20
                <dbl>       <dbl>       <dbl>       <dbl>
1                  10        1000           0           0
2                 100           0        1000        1000
3                 200           0        2000        2000
4                 300           0        3000        3000
5                 400           0           0           0
  • Related