Home > OS >  Reduce/flatten levels in list of dataframes of arbitrary levels hiearchy in R
Reduce/flatten levels in list of dataframes of arbitrary levels hiearchy in R

Time:06-17

I have multiple lists of dataframes, stored in different levels of the list hierarchy in another list. I want to "flatten" the list, so that only the lowest level of the hierarchy remain. I can't use unlist() or purrr::flatten() because this unravels the dataframes.

Is there a simple, generic way to remove the hiearchical structure, and create a list where only two levels remain (a list of lists of dataframes)?


Code example:
Generate data structure:
library(dplyr)

n <- 12
df <- lapply(1:3, function(x) {
    x <- lapply(sample.int(4,n, replace = TRUE), function(y) {
        ceiling(y*runif(100))}
    ) %>% as.data.frame()
    names(x) <- letters[1:n]
    return(x)
})

my_list <- lst()
for (n in 1:3) {
    my_list$a[[n]] <- df[[n]][,1:3]
}
for (n in 1:3) {
    my_list$b$c[[n]] <- df[[n]][,4:6]
}
for (n in 1:3) {
    my_list$a$b$d$e[[n]] <- df[[n]][,7:9]
}

my_list %>% str()
Working code for what I want:
lst(
    a = my_list$a[1:3],
    b = my_list$a$b$d$e,
    c = my_list$b$c
    
) %>% str()

Outputs:
Multilevel hierarchical structure:
List of 2
 $ a:List of 4
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 1 1 1 1 1 2 2 2 1 ...
  .. ..$ b: num [1:100] 1 1 1 2 2 1 2 2 2 2 ...
  .. ..$ c: num [1:100] 2 1 1 2 1 1 1 2 1 2 ...
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 2 1 1 2 1 3 3 1 3 ...
  .. ..$ b: num [1:100] 1 1 3 2 3 1 3 3 3 3 ...
  .. ..$ c: num [1:100] 1 2 2 1 3 2 4 3 3 1 ...
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ c: num [1:100] 2 2 1 1 1 1 1 1 1 2 ...
  ..$ b:List of 1
  .. ..$ d:List of 1
  .. .. ..$ e:List of 3
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 3 3 1 3 1 1 1 3 1 2 ...
  .. .. .. .. ..$ h: num [1:100] 1 1 2 1 1 1 1 2 1 1 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 2 2 2 1 1 2 2 1 ...
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. ..$ h: num [1:100] 2 4 4 4 3 3 3 2 4 4 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 2 1 3 2 3 1 1 2 1 2 ...
  .. .. .. .. ..$ h: num [1:100] 1 2 1 2 1 1 1 1 1 2 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
 $ b:List of 1
  ..$ c:List of 3
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 2 2 2 1 1 1 2 1 1 1 ...
  .. .. ..$ e: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ f: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 1 2 2 2 1 2 2 2 1 1 ...
  .. .. ..$ e: num [1:100] 1 2 2 1 2 1 1 1 2 2 ...
  .. .. ..$ f: num [1:100] 2 2 1 1 1 2 2 1 1 1 ...
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 2 3 3 1 3 4 4 4 1 3 ...
  .. .. ..$ e: num [1:100] 1 2 2 1 1 1 3 2 3 3 ...
  .. .. ..$ f: num [1:100] 3 3 3 3 1 2 2 2 3 1 ...
The desired output, a two-level list structure:
List of 3
 $ a:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 1 1 1 1 1 2 2 2 1 ...
  .. ..$ b: num [1:100] 1 1 1 2 2 1 2 2 2 2 ...
  .. ..$ c: num [1:100] 2 1 1 2 1 1 1 2 1 2 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 2 1 1 2 1 3 3 1 3 ...
  .. ..$ b: num [1:100] 1 1 3 2 3 1 3 3 3 3 ...
  .. ..$ c: num [1:100] 1 2 2 1 3 2 4 3 3 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ c: num [1:100] 2 2 1 1 1 1 1 1 1 2 ...
 $ b:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 3 3 1 3 1 1 1 3 1 2 ...
  .. ..$ h: num [1:100] 1 1 2 1 1 1 1 2 1 1 ...
  .. ..$ i: num [1:100] 1 1 2 2 2 1 1 2 2 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ h: num [1:100] 2 4 4 4 3 3 3 2 4 4 ...
  .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 2 1 3 2 3 1 1 2 1 2 ...
  .. ..$ h: num [1:100] 1 2 1 2 1 1 1 1 1 2 ...
  .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
 $ c:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 2 2 2 1 1 1 2 1 1 1 ...
  .. ..$ e: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ f: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 1 2 2 2 1 2 2 2 1 1 ...
  .. ..$ e: num [1:100] 1 2 2 1 2 1 1 1 2 2 ...
  .. ..$ f: num [1:100] 2 2 1 1 1 2 2 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 2 3 3 1 3 4 4 4 1 3 ...
  .. ..$ e: num [1:100] 1 2 2 1 1 1 3 2 3 3 ...
  .. ..$ f: num [1:100] 3 3 3 3 1 2 2 2 3 1 ...

CodePudding user response:

One option would be to flatten the list into a list of data frames and then split it into a list of a list of data frames

flatten <- function(x) {
  while (any(vapply(x, inherits, logical(1L), 'list'))) {
    x <- lapply(x, function(xx)
      if (inherits(xx, 'list'))
        xx else list(xx))
    x <- unlist(x, recursive = FALSE)
  }
  x
}

fl <- flatten(my_list)
str(split(fl, gsub('\\d $', '', names(fl))))

CodePudding user response:

I don't know how to achieve this using standard flattening functions but designing an algorithm that can do it is pretty straightforward. You just go through the structure of nested lists and keep only those that have no other list as a child.

find_last_lists <- function(lst, parent.names=NULL) {
  
  # return 'lst' if it has no items that are lists
  if (!any(sapply(lst, is.list))) {
    
    setNames(list(lst), 
             parent.names[[length(parent.names)-1]])
    
  # otherwise go through all items recursively 
  } else {
    
    df.list <- NULL
    for (i in seq_along(lst)) {
      
      df.list <- c(df.list, 
                   find_last_lists(lst[[i]], 
                                   c(parent.names, list(names(lst)[i]))))
    }
    
    df.list
  }
}

It is basically a depth-first traversal of a tree for which I used a recursive function (a non-recursive solution would be possible as well). parent.names stores the sequence of names of parent list items.

fl <- find_last_lists(my_list)
# List of 9
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 3 2 2 2 1 3 1 3 1 2 ...
#   ..$ b: num [1:100] 3 3 1 2 2 1 3 3 2 2 ...
#   ..$ c: num [1:100] 2 1 1 2 4 1 2 3 3 3 ...
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 1 1 1 1 1 1 1 2 2 2 ...
#   ..$ b: num [1:100] 2 4 4 2 1 1 2 3 3 4 ...
#   ..$ c: num [1:100] 1 1 3 3 2 1 2 3 1 3 ...
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ c: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ h: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ i: num [1:100] 1 2 2 1 1 1 1 1 2 2 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ h: num [1:100] 1 2 2 1 1 2 2 2 1 1 ...
#   ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 2 1 2 2 3 3 3 2 ...
#   ..$ h: num [1:100] 2 1 1 1 2 2 1 2 2 2 ...
#   ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 2 1 3 3 3 4 4 4 3 3 ...
#   ..$ e: num [1:100] 2 3 3 3 3 3 3 2 3 3 ...
#   ..$ f: num [1:100] 4 1 1 1 2 1 2 4 4 3 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 4 1 1 3 4 4 4 4 4 2 ...
#   ..$ e: num [1:100] 4 3 4 2 4 4 2 4 2 4 ...
#   ..$ f: num [1:100] 3 1 2 2 2 1 3 3 2 3 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 1 1 4 3 3 1 1 2 2 1 ...
#   ..$ e: num [1:100] 2 1 1 3 3 1 1 1 1 3 ...
#   ..$ f: num [1:100] 1 3 2 2 4 4 1 3 3 2 ...

The result is a list of data frames which can be further grouped and reordered into your desired format as follows:

fl <- tapply(fl, names(fl), unname)
fl <- fl[order(names(fl))]
  • Related