Home > front end >  sum cells across similar data frames within a list in R, by data frames names
sum cells across similar data frames within a list in R, by data frames names

Time:03-15

I have a list of data frames that look like this:

df1_BC <- data.frame(name=c("name1", "name2", "name3"),
                  year1=c(23, 45, 54),
                  year2=c(54, 23, 79),
                  year3=c(67, 29, 76))

df2_BC <- data.frame(name=c("name1", "name2", "name3"),
                  year1=c(93, 32, 56),
                  year2=c(82, 96, 72),
                  year3=c(54, 76, 19))

df3_BC <- data.frame(name=c("name1", "name2", "name3"),
                  year1=c(83, 41, 92),
                  year2=c(76, 73, 65),
                  year3=c(63, 62, 95))


df1_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
                  year1=c(23, 35, 54, 41),
                  year2=c(84, 23, 79, 69),
                  year3=c(97, 29, 76, 0))

df2_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
                  year1=c(93, 32, 56, 64),
                  year2=c(82, 96, 53, 0),
                  year3=c(54, 76, 19, 3))

df3_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
                  year1=c(83, 41, 92, 5),
                  year2=c(76, 3, 65, 82),
                  year3=c(3, 62, 95, 6)) 

list_dfs <- list(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)

As you can see, dataframes with the same sufix ('BA' or 'BC') have the same columns and number of rows.

What I want to do is to sum across the cells of the two groups of dataframes (the ones with the 'AB' suffix and the ones with the 'BC' suffix).

If I do it on the dataframes alone, without listing them, I get the expected result:

result_BA <- df1_BA[,-1]   df2_BA[,-1]   df3_BA[,-1]
result_BC <- df1_BC[,-1]   df2_BC[,-1]   df3_BC[,-1]

print(result_BA)
  year1 year2 year3
1   199   242   154
2   108   122   167
3   202   197   190
4   110   151     9

As you can also see, is necessary to keep the name column away to do the sum. EDIT: Then I would like to put it back. Something like this:

result_BA <- cbind(df1_BA[,-1], result_BA)

To have column of names added back to each corresponding dataframe in the list.

This is a simplified example from much larger lists, so doing it as a list and matching the dataframes to add up by suffix really simplifies the task.

Thanks!

CodePudding user response:

The list didn't have any names. We need to construct with names one option is to create a named list, split the list by the substring of the names, and use Reduce to the inner list elements

list_dfs <- list(df1_BC = df1_BC, df2_BC = df2_BC, df3_BC = df3_BC, 
   df1_BA = df1_BA, df2_BA = df2_BA, df3_BA = df3_BA)
lapply(split(list_dfs, sub(".*_", "", names(list_dfs))),
        \(x) Reduce(` `, lapply(x, `[`, -1)))

-output

$BA
  year1 year2 year3
1   199   242   154
2   108   122   167
3   202   197   190
4   110   151     9

$BC
  year1 year2 year3
1   199   212   184
2   118   192   167
3   202   216   190

Or this may be done with tidyverse using a group by approach

library(dplyr)
library(tidyr)
library(data.table)
list_dfs <- lst(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)
bind_rows(list_dfs, .id = 'name') %>% 
 separate(name, into = c("name1", "name2")) %>% 
 mutate(grp = rowid(name1, name2)) %>%
 group_by(name2, grp) %>% 
 summarise(across(where(is.numeric), sum), .groups = "drop") %>%
 select(-grp)

-output

# A tibble: 7 × 4
  name2 year1 year2 year3
  <chr> <dbl> <dbl> <dbl>
1 BA      199   242   154
2 BA      108   122   167
3 BA      202   197   190
4 BA      110   151     9
5 BC      199   212   184
6 BC      118   192   167
7 BC      202   216   190
  • Related