I want to combine/reduce a list of dataframes into one dataframe, but I also want to summarize the data in one step. The output is from a simulation; therefore, each dataframe has the same output structure (i.e., a Group column, then 2 columns with values, which will have values that vary for each output).
Minimal Reproducible Example
df_list <- list(structure(list(Group = c("A", "B", "C"), Top_Group = c(1L,
0L, 0L), Efficiency = c(0.464688158128411, 0.652386676520109,
0.282913417555392)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(Group = c("A", "B", "C"
), Top_Group = c(0L, 1L, 0L), Efficiency = c(0.120292583014816,
0.0356206290889531, 0.37196880299598)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame")), structure(list(
Group = c("A", "B", "C"), Top_Group = c(0L, 1L, 0L), Efficiency = c(0.261322160949931,
0.383351784432307, 0.754808459430933)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame")))
What I Have Tried
I know I could bind the data together, then group and summarize.
library(tidyverse)
df_list %>%
bind_rows() %>%
group_by(Group) %>%
summarise(Top_Group = sum(Top_Group), Efficiency = max(Efficiency))
# Group Top_Group Efficiency
# <chr> <int> <dbl>
#1 A 1 0.465
#2 B 2 0.652
#3 C 0 0.755
I was hoping that there was someway to use something like reduce
; however, I can only get it to work for pulling out one column (like Top_Group
shown here), and am unsure how to use across all columns (if possible) and return a dataframe instead of vectors.
df_list %>%
map(2) %>%
reduce(` `)
# [1] 1 2 0
Expected Output
Group Top_Group Efficiency
<chr> <int> <dbl>
1 A 1 0.465
2 B 2 0.652
3 C 0 0.755
CodePudding user response:
Based on the OP's code, different functions were used on different columns. So, we may have to individually apply those elementwise functions
library(purrr)
reduce(df_list, ~ tibble(.x[1], .x[2] .y[2], pmax(.x[3], .y[3])))
-output
# A tibble: 3 × 3
Group Top_Group Efficiency
<chr> <int> <dbl>
1 A 1 0.465
2 B 2 0.652
3 C 0 0.755
CodePudding user response:
In base R you could just do
reduce(df_list, function(a, b) cbind(a[1], a[2] b[2], pmax(a[3], b[3])))
#> Group Top_Group Efficiency
#> 1 A 1 0.4646882
#> 2 B 2 0.6523867
#> 3 C 0 0.7548085
CodePudding user response:
Yet another solution with reduce
, fulljoin
, and then a rowwise
summarize
:
library(tidyverse)
df_list %>%
reduce(full_join, by = "Group") %>%
rowwise() %>%
summarize(Group = Group,
Top_Group = sum(c_across(starts_with("Top_Group"))),
Efficiency = max(c_across(starts_with("Efficiency")))) %>%
ungroup()
# A tibble: 3 x 3
Group Top_Group Efficiency
<chr> <int> <dbl>
1 A 1 0.465
2 B 2 0.652
3 C 0 0.755
CodePudding user response:
A base R option using aggregate
ave
aggregate(
. ~ Group,
transform(
do.call(
rbind,
df_list
),
Efficiency = ave(
Efficiency,
Group,
FUN = function(x) max(x) / length(x)
)
), sum
)
gives
Group Top_Group Efficiency
1 A 1 0.4646882
2 B 2 0.6523867
3 C 0 0.7548085
CodePudding user response:
You almost had it! Check out ?unnest()
require(tidyverse)
df_list %>%
tibble() %>%
unnest(cols = c(.)) %>%
group_by(Group) %>%
summarise(Top_Group = sum(Top_Group), Efficiency = max(Efficiency))
# A tibble: 3 x 3
Group Top_Group Efficiency
<chr> <int> <dbl>
1 A 1 0.465
2 B 2 0.652
3 C 0 0.755