Home > Mobile >  Average of dataframes values in a list by name
Average of dataframes values in a list by name

Time:10-14

I have a question related to R and nested lists. Let's assume I have a nested list with this structure:

library(tidyr)
library(purrr)

simul<-list(
  "Q"=tibble(
    months = seq(1,12,by=1),
    run1 = runif(12),
    run2 = runif(12),
    run3= runif(12)),
  "ET1"=tibble(
    months = seq(1,12,by=1),
    run1 = runif(12),
    run2 = runif(12),
    run3= runif(12)),
  "ET2"=tibble(
    months = seq(1,12,by=1),
    run1 = runif(12),
    run2 = runif(12),
    run3= runif(12)),
  "ET3"=tibble(
    months = seq(1,12,by=1),
    run1 = runif(12),
    run2 = runif(12),
    run3= runif(12))
  )

I am trying to obtain some averages by variables ("ET") and by run, thus maintaining the lower-level list structure. The variables are grouped by their starting characters, and vary by the number at the end.

So far I solved my problem in this way, however, I was wondering if you could please suggest to me a better way, which I can apply easily to a larger list, with more "variables" and "runs".

nested_avg <-list("Q"=simul["Q"],
                  "ET"=tibble(months = simul$ET1$months,
                              run1=rowMeans(do.call(cbind, simul[c("ET1","ET2","ET3")]%>% map(2))),
                              run2=rowMeans(do.call(cbind, simul[c("ET1","ET2","ET3")]%>% map(3))),
                              run3=rowMeans(do.call(cbind, simul[c("ET1","ET2","ET3")]%>% map(4)))
                  ))

Thank you so much for your answer.

CodePudding user response:

Some enframe then deframe with grouping does the trick:

simul %>%
  setNames(str_remove_all(names(.), "\\d")) %>%
  enframe() %>%
  group_by(name) %>%
  summarise(value = list(as_tibble(Reduce(" ", value) / n()))) %>%
  deframe()

Note that the as_tibble is necessary because Reduce automatically converts to matrix.

CodePudding user response:

Looping and Mapping (not purrr::map, base::Map in this instance):

vars <- startsWith(names(simul), "ET")
out <- list(
    "Q" = simul["Q"],
    "ET" = as_tibble(do.call(Map, c(\(...) rowMeans(cbind(...)), simul[vars])))
)

identical(out, nested_avg)
#[1] TRUE

And an alternative method flattening everything to an array:

out <- list(
  "Q" = simul["Q"],
  "ET" = as_tibble(apply(sapply(simul[vars], as.matrix, simplify="array"), 1:2, mean))
)

identical(out, nested_avg)
#[1] TRUE
  • Related