I have this list in R (I only have access to the list - not d1, d2, d3, d4... I just included these to make this stackoverflow question reproducible):
d1 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d2 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d3 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
d4 = data.frame(v1 = rnorm(20,20,20), c2 = rnorm(20,20,20), id = 1:20)
my_list = list(d1,d2, d3, d4)
I want to create a new data frame (20 rows, 2 columns) that contains the average value of v1 and c2 for each id. I tried the this code:
final_data = data.frame(mean_v1 = mean(my_list[[1]][1] my_list[[2]][1] my_list[[3]][1] my_list[[4]][1]), mean_c2 = mean(my_list[[1]][2] my_list[[2]][2] my_list[[3]][2] my_list[[4]][2]))
But this is giving me a warning message and an empty result:
Warning messages:
1: In mean.default(my_list[[1]][1] my_list[[2]][1] my_list[[3]][1], :
argument is not numeric or logical: returning NA
2: In mean.default(my_list[[1]][2] my_list[[2]][2] my_list[[3]][2], :
argument is not numeric or logical: returning NA
> final_data
mean_v1 mean_c2
1 NA NA
- Is there a better way to accomplish this such that it works and in such a way that I don't have to manually write
my_list[]
again and again?
In the end, this would look something like this:
mean_v1 mean_c2 id
1 37.1730736 49.3012881 1
2 -0.7861481 -9.5201620 2
3 47.2629669 -4.0249373 3
4 -25.4266542 16.6597656 4
5 18.1102329 15.0924825 5
6 -7.7148600 21.0085447 6
7 37.2753666 21.7701739 7
8 53.5393623 0.2115059 8
9 12.2578949 -11.6501821 9
10 18.3532267 44.0709866 10
11 -0.7528975 15.0990824 11
12 12.8841962 25.8737362 12
13 43.1026041 16.5399091 13
14 -1.6249458 39.6677542 14
15 23.4145601 33.0496240 15
16 -6.8168808 7.8944851 16
17 -18.8746847 16.3386228 17
18 32.8151604 14.7895162 18
19 -0.3587592 -3.2358145 19
20 11.7361017 -3.5663637 20
Thank you!
CodePudding user response:
We may bind the list
elements and then do a group by mean
library(dplyr)
bind_rows(my_list) %>%
group_by(id) %>%
summarise(across(everything(), mean, na.rm = TRUE), .groups = 'drop')
Or with base R
using aggregate
and rbind
aggregate(.~ id, do.call(rbind, my_list), mean)
Regarding the issue in OP's post, it is just that mean
needs a vector as input whereas the OP's code is returning a data.frame
with one column
> str(my_list[[1]][1])
'data.frame': 20 obs. of 1 variable:
$ v1: num -19.1 10.7 -1.8 26.4 28.8 ...
> str(my_list[[1]][[1]])
num [1:20] -19.1 10.7 -1.8 26.4 28.8 ...
and thus mean
returns NA
mean(my_list[[1]][1])
[1] NA
Warning message:
In mean.default(my_list[[1]][1]) :
argument is not numeric or logical: returning NA
instead, it should be
mean(my_list[[1]][[1]])
[1] 18.28274
CodePudding user response:
With the new pipe operator, introduced in R 4.2.0:
my_list |>
do.call(rbind, args = _) |>
aggregate(v1 ~ id, data = _, mean)