I've got a list of more than 6000 dataframes with about 6000 observations of 2000 variables. I would like to calculate a mean dataframe of 6000x2000 in which each value is the mean of that same position across all dataframes (as example: mean.df[1,1] = mean(df1[1,1],df2[1,1],df3[1,1],.....dfN[1,1]
)
I've prepared a short example to illustrate what I need
# Build list of dataframes
df.X <- data.frame("A1" = seq(1,5,1), "A2" = seq(2,10,2) , "A3" = seq(10,18,2))
df.Y <- data.frame("B1" = seq(0,8,2), "B2" = seq(4,8,1) , "B3" = seq(10,18,2))
df.Z <- data.frame("C1" = seq(2,10,2), "C2" = 1:5 , "C3" = seq(10,18,2))
df.list <- list(df.X,df.Y,df.Z)
#Desired output
m1 M2 M3
1.0 2.3 10.0
2.7 3.7 12.0
4.3 5.0 14.0
6.0 6.3 16.0
7.7 7.7 18.0
I have made some attempts with purrr
with reduce
and map
but had no success. Mabye apply?
Thanks in advance
CodePudding user response:
Base R
Sum all elements of the list together, and divide by number of elements.
Does not work with NA-elements!
Reduce(` `, df.list) / length(df.list)
A1 A2 A3
1 1.000000 2.333333 10
2 2.666667 3.666667 12
3 4.333333 5.000000 14
4 6.000000 6.333333 16
5 7.666667 7.666667 18
All there is left to do is setting the colnames.
CodePudding user response:
A tidyverse
solution. We can combine all data frames, create an ID
column by each Group
, and calculate the mean. This solution is not as succinct as Wimpel's solution, but it can handle missing values by setting na.rm = TRUE
in the summarize
and across
call.
library(tidyverse)
df_mean <- df.list %>%
map_dfr(~setNames(.x, paste0("A", 1:ncol(.x))), .id = "Group") %>%
group_by(Group) %>%
mutate(ID = 1:n()) %>%
group_by(ID) %>%
summarize(across(-Group, .fns = mean, na.rm = TRUE)) %>%
select(-ID)
# # A tibble: 5 x 3
# A1 A2 A3
# <dbl> <dbl> <dbl>
# 1 1 2.33 10
# 2 2.67 3.67 12
# 3 4.33 5 14
# 4 6 6.33 16
# 5 7.67 7.67 18
CodePudding user response:
An option is also to convert to array
and use colMeans
apply(array(unlist(df.list), c(dim(df.list[[1]]), length(df.list))), 2, rowMeans)
-output
[,1] [,2] [,3]
[1,] 1.000000 2.333333 10
[2,] 2.666667 3.666667 12
[3,] 4.333333 5.000000 14
[4,] 6.000000 6.333333 16
[5,] 7.666667 7.666667 18