Combine specific columns in different dataframes, and sum values based on row while keeping unique r-CodePudding

I'm trying to summarise a set of ~100 dataframes into one, the data has a unique identifier and a common identifier, and a relative quantity.

I want my final dataframe to sum all quanitites that match the common identifier, and then fill a dataframe with column names listed as the name given to the dataframe. I also only want to include the values, and ignore the other columns.

t1:

   Item    Code Sold
1 Apple Apple_1    0
2 Apple Apple_2    0
3 Apple Apple_3    1
4  Pear  Pear_1    1
5  Pear  Pear_2    4
6  Pear  Pear_3    2

    structure(list(Item = c("Apple", "Apple", "Apple", "Pear", "Pear", 
"Pear"), Code = c("Apple_1", "Apple_2", "Apple_3", "Pear_1", 
"Pear_2", "Pear_3"), Sold = c(0L, 0L, 1L, 1L, 4L, 2L)), class = "data.frame", row.names = c(NA, 
-6L))

t2:

        Item     Code Sold
1  Apple  Apple_1    1
2  Apple  Apple_2    3
3  Apple  Apple_3    0
4 Banana Banana_1    1
5 Banana Banana_2    1
6   Pear   Pear_1    4
7   Pear   Pear_2    2
8   Pear   Pear_3    1
9 Orange Orange_1    4

structure(list(Item = c("Apple", "Apple", "Apple", "Banana", 
"Banana", "Pear", "Pear", "Pear", "Orange"), Code = c("Apple_1", 
"Apple_2", "Apple_3", "Banana_1", "Banana_2", "Pear_1", "Pear_2", 
"Pear_3", "Orange_1"), Sold = c(1L, 3L, 0L, 1L, 1L, 4L, 2L, 1L, 
4L)), class = "data.frame", row.names = c(NA, -9L))

Result:

    Item t1 t2
1  Apple  1  4
2 Banana  0  2
3   Pear  7  7
4 Orange  0  4

structure(list(Item = c("Apple", "Banana", "Pear", "Orange"), 
t1 = c(1L, 0L, 7L, 0L), t2 = c(4L, 2L, 7L, 4L)), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response：

Here is another approach with tidyverse. You can put your data.frames into a list. In this example, it is collecting objects starting with "t" and include a number in the global environment.

With bind_rows you can put your data.frames together with a column designating the source of the data (name of data.frame). After totaling up the sum of Sold, use pivot_wider with names of data.frames as columns.

library(tidyverse)

lst <- mget(ls(pattern = "t[0-9]"))

lst %>%
  bind_rows(.id = "source") %>%
  group_by(source, Item) %>%
  summarise(total = sum(Sold)) %>%
  pivot_wider(id_cols = Item, names_from = source, values_from = total, values_fill = 0)

Output

  Item      t1    t2
  <chr>  <int> <int>
1 Apple      1     4
2 Pear       7     7
3 Banana     0     2
4 Orange     0     4

CodePudding user response：

A purrr-based solution. If your dataframes are in a named list:

library(tidyverse)

t3 <- dfs %>%
  imap(~ summarize(group_by(.x, Item), !!.y := sum(Sold))) %>%
  reduce(full_join, by = "Item") %>%
  mutate(across(!Item, ~ replace_na(.x, 0)))

# # A tibble: 4 x 3
#   Item      t1    t2
#   <chr>  <int> <int>
# 1 Apple      1     4
# 2 Pear       7     7
# 3 Banana     0     2
# 4 Orange     0     4

If the objects are loose in the environment, you can adapt the code to first collect them into a list:

df_names <- purrr::set_names(ls(pattern = "^t\\d $"))

t3 <- df_names %>%
  map(~ summarize(group_by(get(.x), Item), !!.x := sum(Sold))) %>%
  reduce(full_join, by = "Item") %>%
  mutate(across(!Item, ~ replace_na(.x, 0)))

# # A tibble: 4 x 3
#   Item      t1    t2
#   <chr>  <int> <int>
# 1 Apple      1     4
# 2 Pear       7     7
# 3 Banana     0     2
# 4 Orange     0     4

CodePudding user response：

You can use tidyverse for achieving the result you are looking for. You need to first group_by the two dfs using Item and summarise the Sold column.

The code will look like this:

library(tidyverse)


t1 <- t1 %>%
  group_by(Item) %>%
  summarise(Sold = sum(Sold))

t2 <- t2 %>%
  group_by(Item) %>%
  summarise(Sold = sum(Sold))

t3 <- merge(x = t1, y = t2, by = "Item", all.y = T)
t3[is.na(t3)] <- 0
colnames(t3) <- c("Item", "t1", "t2")

Output:

    Item t1 t2
1  Apple  1  4
2 Banana  0  2
3 Orange  0  4
4   Pear  7  7