Hi I am looking to combine multiple data frames based on a common row name, but some of these data frames may have different row names that may exist in some, and not in others.
For example, I have these data frames
df1
Payments | Safeway | Costco | Fredmeyer |
---|---|---|---|
Gas | 3.34 | 3.23 | 3.37 |
Grocery | 10 | 22 | 17 |
Membership | 0 | 80 | 0 |
Utility | 89 | 67 | 78 |
Annual | 120 | 112 | 98 |
Another data frame: df2
Payments | SamsClub | Walmart |
---|---|---|
Gas | 3.39 | 3.09 |
Grocery | 13 | 28 |
Rating | 4.2 | 4.8 |
Basic | 4 | 7 |
A third data frame, df3
Payments | TraderJoes | WholeFoods | PayLess |
---|---|---|---|
Grocery | 15 | 28 | 9 |
Utility | 77 | 97 | 65 |
Annual | 0 | 0 | 5 |
With all of these data tables, I want to keep the different store names as the columns, and then combine based on the "Payments" name column. But if a certain data frame does not have that specific payment/value available, then I would still include it for that store, but place the value as 0 for it. In the end, all of these Payments would then be rearranged to alphabetical order, with the final result looking like below:
df4
Payments | Safeway | Costco | Fredmeyer | SamsClub | Walmart | TraderJoes | WholeFoods | Payless |
---|---|---|---|---|---|---|---|---|
Annual | 120 | 112 | 98 | 0 | 0 | 0 | 0 | 5 |
Basic | 0 | 0 | 0 | 5 | 7 | 0 | 0 | 0 |
Gas | 3.34 | 3.23 | 3.37 | 3.39 | 3.09 | 0 | 0 | 0 |
Grocery | 10 | 22 | 17 | 13 | 28 | 15 | 28 | 9 |
Membership | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 |
Rating | 0 | 0 | 0 | 4.2 | 4.8 | 0 | 0 | 0 |
Utility | 89 | 67 | 78 | 0 | 0 | 77 | 97 | 65 |
Please know if this is possible. I have been finding it difficult to solve this, and would appreciate any help on this. Thank you!
I also went ahead and prepared the code for the data frames if that helps with solving this faster:
# Make df1
Payments <- c("Gas", "Grocery", "Membership", "Utility", "Annual")
Safeway <- c(3.34, 10, 0, 89, 120)
Costco <- c(3.23, 22, 80, 67, 112)
Fredmeyer <- (3.37, 17, 0, 78, 98)
df1 <- data.frame(Payments, Safeway, Costco, Fredmeyer)
# Make df2
Payments <- c("Gas", "Grocery", "Rating", "Basic")
SamsClub <- c(3.39, 13, 4.2, 2)
Walmart <- c(3.09, 28, 4.8, 7)
df2 <- data.frame(Payments, SamsClub, Walmart)
# Make df3
Payments <- c("Grocery", "Utility", "Annual")
TraderJoes <- c(15, 77, 0)
WholeFoods <- c(28, 97, 0)
Payless <- c(9, 65, 5)
df3 <- data.frame(Payments, TraderJoes, Wholefoods, Payless)
CodePudding user response:
We can bind the datasets together with bind_rows
and do a group by sum
library(dplyr)
bind_rows(df1, df2, df3) %>%
group_by(Payments) %>%
summarise(across(everything(), sum, na.rm = TRUE))
-output
# A tibble: 7 × 9
Payments Safeway Costco Fredmeyer SamsClub Walmart TraderJoes WholeFoods Payless
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Annual 120 112 98 0 0 0 0 5
2 Basic 0 0 0 2 7 0 0 0
3 Gas 3.34 3.23 3.37 3.39 3.09 0 0 0
4 Grocery 10 22 17 13 28 15 28 9
5 Membership 0 80 0 0 0 0 0 0
6 Rating 0 0 0 4.2 4.8 0 0 0
7 Utility 89 67 78 0 0 77 97 65