I'm trying to summarise a set of ~100 dataframes into one, the data has a unique identifier and a common identifier, and a relative quantity.
I want my final dataframe to sum all quanitites that match the common identifier, and then fill a dataframe with column names listed as the name given to the dataframe. I also only want to include the values, and ignore the other columns.
t1:
Item Code Sold
1 Apple Apple_1 0
2 Apple Apple_2 0
3 Apple Apple_3 1
4 Pear Pear_1 1
5 Pear Pear_2 4
6 Pear Pear_3 2
structure(list(Item = c("Apple", "Apple", "Apple", "Pear", "Pear",
"Pear"), Code = c("Apple_1", "Apple_2", "Apple_3", "Pear_1",
"Pear_2", "Pear_3"), Sold = c(0L, 0L, 1L, 1L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-6L))
t2:
Item Code Sold
1 Apple Apple_1 1
2 Apple Apple_2 3
3 Apple Apple_3 0
4 Banana Banana_1 1
5 Banana Banana_2 1
6 Pear Pear_1 4
7 Pear Pear_2 2
8 Pear Pear_3 1
9 Orange Orange_1 4
structure(list(Item = c("Apple", "Apple", "Apple", "Banana",
"Banana", "Pear", "Pear", "Pear", "Orange"), Code = c("Apple_1",
"Apple_2", "Apple_3", "Banana_1", "Banana_2", "Pear_1", "Pear_2",
"Pear_3", "Orange_1"), Sold = c(1L, 3L, 0L, 1L, 1L, 4L, 2L, 1L,
4L)), class = "data.frame", row.names = c(NA, -9L))
Result:
Item t1 t2
1 Apple 1 4
2 Banana 0 2
3 Pear 7 7
4 Orange 0 4
structure(list(Item = c("Apple", "Banana", "Pear", "Orange"),
t1 = c(1L, 0L, 7L, 0L), t2 = c(4L, 2L, 7L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
Here is another approach with tidyverse
. You can put your data.frames into a list. In this example, it is collecting objects starting with "t" and include a number in the global environment.
With bind_rows
you can put your data.frames together with a column designating the source of the data (name of data.frame). After totaling up the sum of Sold
, use pivot_wider
with names of data.frames as columns.
library(tidyverse)
lst <- mget(ls(pattern = "t[0-9]"))
lst %>%
bind_rows(.id = "source") %>%
group_by(source, Item) %>%
summarise(total = sum(Sold)) %>%
pivot_wider(id_cols = Item, names_from = source, values_from = total, values_fill = 0)
Output
Item t1 t2
<chr> <int> <int>
1 Apple 1 4
2 Pear 7 7
3 Banana 0 2
4 Orange 0 4
CodePudding user response:
A purrr
-based solution. If your dataframes are in a named list:
library(tidyverse)
t3 <- dfs %>%
imap(~ summarize(group_by(.x, Item), !!.y := sum(Sold))) %>%
reduce(full_join, by = "Item") %>%
mutate(across(!Item, ~ replace_na(.x, 0)))
# # A tibble: 4 x 3
# Item t1 t2
# <chr> <int> <int>
# 1 Apple 1 4
# 2 Pear 7 7
# 3 Banana 0 2
# 4 Orange 0 4
If the objects are loose in the environment, you can adapt the code to first collect them into a list:
df_names <- purrr::set_names(ls(pattern = "^t\\d $"))
t3 <- df_names %>%
map(~ summarize(group_by(get(.x), Item), !!.x := sum(Sold))) %>%
reduce(full_join, by = "Item") %>%
mutate(across(!Item, ~ replace_na(.x, 0)))
# # A tibble: 4 x 3
# Item t1 t2
# <chr> <int> <int>
# 1 Apple 1 4
# 2 Pear 7 7
# 3 Banana 0 2
# 4 Orange 0 4
CodePudding user response:
You can use tidyverse
for achieving the result you are looking for.
You need to first group_by
the two df
s using Item
and summarise
the Sold
column.
The code will look like this:
library(tidyverse)
t1 <- t1 %>%
group_by(Item) %>%
summarise(Sold = sum(Sold))
t2 <- t2 %>%
group_by(Item) %>%
summarise(Sold = sum(Sold))
t3 <- merge(x = t1, y = t2, by = "Item", all.y = T)
t3[is.na(t3)] <- 0
colnames(t3) <- c("Item", "t1", "t2")
Output:
Item t1 t2
1 Apple 1 4
2 Banana 0 2
3 Orange 0 4
4 Pear 7 7