I have a list of data frames that look like this:
df1_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(23, 45, 54),
year2=c(54, 23, 79),
year3=c(67, 29, 76))
df2_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(93, 32, 56),
year2=c(82, 96, 72),
year3=c(54, 76, 19))
df3_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(83, 41, 92),
year2=c(76, 73, 65),
year3=c(63, 62, 95))
df1_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(23, 35, 54, 41),
year2=c(84, 23, 79, 69),
year3=c(97, 29, 76, 0))
df2_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(93, 32, 56, 64),
year2=c(82, 96, 53, 0),
year3=c(54, 76, 19, 3))
df3_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(83, 41, 92, 5),
year2=c(76, 3, 65, 82),
year3=c(3, 62, 95, 6))
list_dfs <- list(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)
As you can see, dataframes with the same sufix ('BA' or 'BC') have the same columns and number of rows.
What I want to do is to sum across the cells of the two groups of dataframes (the ones with the 'AB' suffix and the ones with the 'BC' suffix).
If I do it on the dataframes alone, without listing them, I get the expected result:
result_BA <- df1_BA[,-1] df2_BA[,-1] df3_BA[,-1]
result_BC <- df1_BC[,-1] df2_BC[,-1] df3_BC[,-1]
print(result_BA)
year1 year2 year3
1 199 242 154
2 108 122 167
3 202 197 190
4 110 151 9
As you can also see, is necessary to keep the name column away to do the sum. EDIT: Then I would like to put it back. Something like this:
result_BA <- cbind(df1_BA[,-1], result_BA)
To have column of names added back to each corresponding dataframe in the list.
This is a simplified example from much larger lists, so doing it as a list and matching the dataframes to add up by suffix really simplifies the task.
Thanks!
CodePudding user response:
The list
didn't have any names. We need to construct with names one option is to create a named list
, split
the list
by the substring of the names, and use Reduce
to
the inner list elements
list_dfs <- list(df1_BC = df1_BC, df2_BC = df2_BC, df3_BC = df3_BC,
df1_BA = df1_BA, df2_BA = df2_BA, df3_BA = df3_BA)
lapply(split(list_dfs, sub(".*_", "", names(list_dfs))),
\(x) Reduce(` `, lapply(x, `[`, -1)))
-output
$BA
year1 year2 year3
1 199 242 154
2 108 122 167
3 202 197 190
4 110 151 9
$BC
year1 year2 year3
1 199 212 184
2 118 192 167
3 202 216 190
Or this may be done with tidyverse
using a group by approach
library(dplyr)
library(tidyr)
library(data.table)
list_dfs <- lst(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)
bind_rows(list_dfs, .id = 'name') %>%
separate(name, into = c("name1", "name2")) %>%
mutate(grp = rowid(name1, name2)) %>%
group_by(name2, grp) %>%
summarise(across(where(is.numeric), sum), .groups = "drop") %>%
select(-grp)
-output
# A tibble: 7 × 4
name2 year1 year2 year3
<chr> <dbl> <dbl> <dbl>
1 BA 199 242 154
2 BA 108 122 167
3 BA 202 197 190
4 BA 110 151 9
5 BC 199 212 184
6 BC 118 192 167
7 BC 202 216 190