Home > Enterprise >  Extract data frames from nested list
Extract data frames from nested list

Time:12-29

I have a nested list of lists which contains some data frames. However, the data frames can appear at any level in the list. What I want to end up with is a flat list, i.e. just one level, where each element is only the data frames, with all other things discarded.

I have come up with a solution for this, but it looks very clunky and I am sure there ought to be a more elegant solution.

Importantly, I'm looking for something in base R, that can extract data frames at any level inside the nested list. I have tried unlist() and dabbled with rapply() but somehow not found a satisfying solution.

Example code follows: an example list, what I am actually trying to achieve, and my own solution which I am not very happy with. Thanks for any help!

# extract dfs from list

# example of multi-level list with some dfs in it
# note, dfs could be nested at any level
problem1 <- list(x1 = 1,
              x2 = list(
                x3 = "dog",
                x4 = data.frame(cats = c(1, 2),
                               pigs = c(3, 4))
              ),
              x5 = data.frame(sheep = c(1,2,3),
                             goats = c(4,5,6)),
              x6 = list(a = 2,
                       b = "c"),
              x7 = head(cars,5))

# want to end up with flat list like this (names format is optional)
result1 <- list(x2.x4 = data.frame(cats = c(1, 2),
                                   pigs = c(3, 4)),
                x5 = data.frame(sheep = c(1,2,3),
                                goats = c(4,5,6)),
                x7 = head(cars,5))

# my solution (not very satisfactory)

exit_loop <- FALSE
while(exit_loop == FALSE){
  # find dfs (logical)
  idfs <- sapply(problem1, is.data.frame)
  # check if all data frames
  exit_loop <- all(idfs)
  # remove anything not df or list
  problem1 <- problem1[idfs | sapply(problem1, is.list)]
  # find dfs again (logical)
  idfs <- sapply(problem1, is.data.frame)
  # unlist only the non-df part
  problem1 <- c(problem1[idfs], unlist(problem1[!idfs], recursive = FALSE))

}

CodePudding user response:

Maybe consider a simple recursive function like this

find_df <- function(x) {
  if (is.data.frame(x))
    return(list(x))
  if (!is.list(x))
    return(NULL)
  unlist(lapply(x, find_df), FALSE)
}

Results

> find_df(problem1)
$x2.x4
  cats pigs
1    1    3
2    2    4

$x5
  sheep goats
1     1     4
2     2     5
3     3     6

$x7
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16

CodePudding user response:

There is a function called rrapply You could use that. The only downside is that I do not get the required names:

rrapply::rrapply(problem1, is.data.frame, classes = 'data.frame', how = 'flatten')

$x4
  cats pigs
1    1    3
2    2    4

$x5
  sheep goats
1     1     4
2     2     5
3     3     6

$x7
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
  •  Tags:  
  • r
  • Related