R Subsetting nested lists, select multiple entries-CodePudding

I frequently work with large datasets, resulting in me creating nested lists sometimes to reduce the objects in the environment.

When subsetting such a list and wanting to go to the first entry along all steps, it would look like this:

llra[[1]][[1]][[1]]

In some of my current scripts the data in these scripts are aligned to that each of the entries of the last step down the list are comparable. If I would like to compare these or make a calculation it would look something like this:

mean(llra[[1]][[1]][[1]], llra[[1]][[2]][[1]], llra[[1]][[3]][[1]])

Is there a way to subset them differently so I could write it something like this:

mean(llra[[1]][[c(1:3)]][[1]])

Thanks for your help!

CodePudding user response：

Create a small helper function. This creates a grid of indexes and extracts each one. Finally it unlists the result. No packages are used.

unravel <- function(L, ...) {
  if (...length()) L <-
    apply(expand.grid(...), 1, function(ix) L[[ix]], simplify = FALSE)
  unlist(L)
}

# test

L <- list(a = list(b = list(1:3, 4:5), c = list(11:12, 20:25)))

# Example 1

mean(unravel(L, 1, 1:2, 1))
## [1] 5.8

# check
mean(c(L[[1]][[1]][[1]], L[[1]][[2]][[1]]))
## [1] 5.8

# Example 2

mean(unravel(L, 1, 1, 1:2))
## [1] 3

# check
mean(c(L[[1]][[1]][[1]], L[[1]][[1]][[2]]))
## [1] 3

Update

Generalize unravel so that it does not assume three levels or which level(s) are specified as scalar or vector indices.

CodePudding user response：

You can use purrr::map.

mean(map_dbl(1:3, ~llra[[1]][[.x]][[1]]))

CodePudding user response：

Since you did not give us example data set, I created it for you:

ua <- list(
    list(
        list(1),
        list(9),
        list(3),
        list(3)
        )
    )

You can create an expression to use it in a loop:

e <- expression(ua[[1]][[j]][[1]])

I hard-coded first and last indices, and a list name, but you can change it if desired. As it is now, variable ua will be found in a Global Environment during evaluation of e.

Evaluating this in a loop will give you your inner lists values. Variable j will be found in a local environment of the lambda-function (\(j) {eval(e) }:

sapply(1:4, \(j) { eval(e) })

# [1] 1 9 3 3

You can call whatever function you want on it:

cat(mean(sapply(1:4, \(j) { eval(e) })), 'millions \u2620') 

# 4 millions ☠

I assume there have to be another way around. But, at least, using this expression you can write more or less flexible function to deal with such cases. Or you can find completely different, and presumably more simple way of doing the same.