Home > Software engineering >  Taking Averages Across Lists
Taking Averages Across Lists

Time:06-14

I have this list in R (I only have access to the list - not d1, d2, d3, d4... I just included these to make this stackoverflow question reproducible):

d1 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d2 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d3 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d4 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)

my_list = list(d1,d2, d3, d4)

I want to create a new data frame (20 rows, 2 columns) that contains the average value of v1 and c2 for each id. I tried the this code:

final_data = data.frame(mean_v1 =  mean(my_list[[1]][1]   my_list[[2]][1]   my_list[[3]][1]   my_list[[4]][1]), mean_c2 =  mean(my_list[[1]][2]   my_list[[2]][2]   my_list[[3]][2]   my_list[[4]][2]))

But this is giving me a warning message and an empty result:

Warning messages:
1: In mean.default(my_list[[1]][1]   my_list[[2]][1]   my_list[[3]][1],  :
  argument is not numeric or logical: returning NA
2: In mean.default(my_list[[1]][2]   my_list[[2]][2]   my_list[[3]][2],  :
  argument is not numeric or logical: returning NA
> final_data
  mean_v1 mean_c2
1      NA      NA
  • Is there a better way to accomplish this such that it works and in such a way that I don't have to manually write my_list[] again and again?

In the end, this would look something like this:

       mean_v1     mean_c2 id
1   37.1730736  49.3012881  1
2   -0.7861481  -9.5201620  2
3   47.2629669  -4.0249373  3
4  -25.4266542  16.6597656  4
5   18.1102329  15.0924825  5
6   -7.7148600  21.0085447  6
7   37.2753666  21.7701739  7
8   53.5393623   0.2115059  8
9   12.2578949 -11.6501821  9
10  18.3532267  44.0709866 10
11  -0.7528975  15.0990824 11
12  12.8841962  25.8737362 12
13  43.1026041  16.5399091 13
14  -1.6249458  39.6677542 14
15  23.4145601  33.0496240 15
16  -6.8168808   7.8944851 16
17 -18.8746847  16.3386228 17
18  32.8151604  14.7895162 18
19  -0.3587592  -3.2358145 19
20  11.7361017  -3.5663637 20

Thank you!

CodePudding user response:

We may bind the list elements and then do a group by mean

library(dplyr)
bind_rows(my_list) %>% 
  group_by(id) %>%
  summarise(across(everything(), mean, na.rm = TRUE), .groups = 'drop')

Or with base R using aggregate and rbind

aggregate(.~ id, do.call(rbind, my_list), mean)

Regarding the issue in OP's post, it is just that mean needs a vector as input whereas the OP's code is returning a data.frame with one column

> str(my_list[[1]][1])
'data.frame':   20 obs. of  1 variable:
 $ v1: num  -19.1 10.7 -1.8 26.4 28.8 ...
> str(my_list[[1]][[1]])
 num [1:20] -19.1 10.7 -1.8 26.4 28.8 ...

and thus mean returns NA

mean(my_list[[1]][1])
[1] NA
Warning message:
In mean.default(my_list[[1]][1]) :
  argument is not numeric or logical: returning NA

instead, it should be

mean(my_list[[1]][[1]])
[1] 18.28274

CodePudding user response:

With the new pipe operator, introduced in R 4.2.0:

my_list |>
  do.call(rbind, args = _) |>
  aggregate(v1 ~ id, data = _, mean)
  • Related