Home > Software design >  Combining multiple Dataframes with different row names
Combining multiple Dataframes with different row names

Time:07-05

Hi I am looking to combine multiple data frames based on a common row name, but some of these data frames may have different row names that may exist in some, and not in others.

For example, I have these data frames

df1

Payments Safeway Costco Fredmeyer
Gas 3.34 3.23 3.37
Grocery 10 22 17
Membership 0 80 0
Utility 89 67 78
Annual 120 112 98

Another data frame: df2

Payments SamsClub Walmart
Gas 3.39 3.09
Grocery 13 28
Rating 4.2 4.8
Basic 4 7

A third data frame, df3

Payments TraderJoes WholeFoods PayLess
Grocery 15 28 9
Utility 77 97 65
Annual 0 0 5

With all of these data tables, I want to keep the different store names as the columns, and then combine based on the "Payments" name column. But if a certain data frame does not have that specific payment/value available, then I would still include it for that store, but place the value as 0 for it. In the end, all of these Payments would then be rearranged to alphabetical order, with the final result looking like below:

df4

Payments Safeway Costco Fredmeyer SamsClub Walmart TraderJoes WholeFoods Payless
Annual 120 112 98 0 0 0 0 5
Basic 0 0 0 5 7 0 0 0
Gas 3.34 3.23 3.37 3.39 3.09 0 0 0
Grocery 10 22 17 13 28 15 28 9
Membership 0 80 0 0 0 0 0 0
Rating 0 0 0 4.2 4.8 0 0 0
Utility 89 67 78 0 0 77 97 65

Please know if this is possible. I have been finding it difficult to solve this, and would appreciate any help on this. Thank you!

I also went ahead and prepared the code for the data frames if that helps with solving this faster:

# Make df1
Payments <- c("Gas", "Grocery", "Membership", "Utility", "Annual")
Safeway <- c(3.34, 10, 0, 89, 120)
Costco <- c(3.23, 22, 80, 67, 112)
Fredmeyer <- (3.37, 17, 0, 78, 98)
df1 <- data.frame(Payments, Safeway, Costco, Fredmeyer)

# Make df2
Payments <- c("Gas", "Grocery", "Rating", "Basic")
SamsClub <- c(3.39, 13, 4.2, 2)
Walmart <- c(3.09, 28, 4.8, 7)
df2 <- data.frame(Payments, SamsClub, Walmart)

# Make df3
Payments <- c("Grocery", "Utility", "Annual")
TraderJoes <- c(15, 77, 0)
WholeFoods <- c(28, 97, 0)
Payless <- c(9, 65, 5)
df3 <- data.frame(Payments, TraderJoes, Wholefoods, Payless)

CodePudding user response:

We can bind the datasets together with bind_rows and do a group by sum

library(dplyr)
bind_rows(df1, df2, df3) %>%
  group_by(Payments) %>% 
   summarise(across(everything(), sum, na.rm = TRUE))

-output

# A tibble: 7 × 9
  Payments   Safeway Costco Fredmeyer SamsClub Walmart TraderJoes WholeFoods Payless
  <chr>        <dbl>  <dbl>     <dbl>    <dbl>   <dbl>      <dbl>      <dbl>   <dbl>
1 Annual      120    112        98        0       0             0          0       5
2 Basic         0      0         0        2       7             0          0       0
3 Gas           3.34   3.23      3.37     3.39    3.09          0          0       0
4 Grocery      10     22        17       13      28            15         28       9
5 Membership    0     80         0        0       0             0          0       0
6 Rating        0      0         0        4.2     4.8           0          0       0
7 Utility      89     67        78        0       0            77         97      65
  • Related