Home > Blockchain >  Convert a list with inconsistent naming to a data frame, with variable depth
Convert a list with inconsistent naming to a data frame, with variable depth

Time:12-13

Consider the following list:

x <- list("a" = list("b", "c"),
          "d" = list("e", "f" = list("g", "h")),
          "i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))

It is worth noting that:

  • Not all names and elements are going to be of one character
  • There is an undetermined level of nesting a priori.

Given x, my aim is to output the data frame:

y <- data.frame(
  main_level = c(rep("a", 2), rep("d", 3), rep("i", 4)),
  level1 = c("b", "c", "e", rep("f", 2), "j", rep("k", 3)),
  level2 = c(NA, NA, NA, "g", "h", NA, "l", "l", "l"),
  level3 = c(NA, NA, NA,  NA,  NA, NA, "m", "n", "n"), 
  level4 = c(NA, NA, NA,  NA,  NA, NA, NA, "o", "p")
)
> y
  main_level level1 level2 level3 level4
1          a      b   <NA>   <NA>   <NA>
2          a      c   <NA>   <NA>   <NA>
3          d      e   <NA>   <NA>   <NA>
4          d      f      g   <NA>   <NA>
5          d      f      h   <NA>   <NA>
6          i      j   <NA>   <NA>   <NA>
7          i      k      l      m   <NA>
8          i      k      l      n      o
9          i      k      l      n      p

NOTE that a typo was corrected in y above.

The above implies that there will be a variable number of columns as well, depending on the depth of the nesting.

Solutions online that I've found, when it comes to nested lists, assume that the list naming structure is more or less consistent, which is of course not the case here; or that the list depth is identical. For instance, the solutions at How to convert a nested lists to dataframe in R? and Converting nested list to dataframe do not apply because they are much more consistent in their naming.

CodePudding user response:

Here's a way mainly relying on rrapply:

rrapply::rrapply(x, how = "melt") |>
  apply(1, function(row){
    newrow <- row[grep("[A-Za-z]", row)]
    length(newrow) <- purrr::vec_depth(x) - 1
    newrow
  }) |> 
  t() |> as.data.frame() |>
  `colnames<-`(c("main_level", paste0("level", 1:4)))

output

  main_level level1 level2 level3 level4
1          a      b   <NA>   <NA>   <NA>
2          a      c   <NA>   <NA>   <NA>
3          d      e   <NA>   <NA>   <NA>
4          d      f      g   <NA>   <NA>
5          d      f      h   <NA>   <NA>
6          i      j   <NA>   <NA>   <NA>
7          i      k      l      m   <NA>
8          i      k      l      n      o
9          i      k      l      n      p

Note that so far it is quite crude. There might be a better way to reshape the output of rrapply. For instance, row[grep("[A-Za-z]", row)] may not work every time. I have also not tested whether length(newrow) <- purrr::vec_depth(x) - 1 is a good way of guessing the length, but it works here.

CodePudding user response:

Here is a recursive function that has no assumptions other than the structure you described:

list_to_df <- function(l) {
  
  leaves <- list()
  
  go_deeper <- function(l, index=1, path=NULL) {

    # we can still go deeper    
    if (is.list(l[[index]])) {
      
      path <- c(path, names(l)[index])
      l <- l[[index]]
      
      lapply(seq_along(l), function(i) go_deeper(l, i, path))

    # this is the final node (leaf)      
    } else {
      
      leaves <<- c(leaves, list(c(path, l[[index]])))
    }
  }
  
  # this saves the paths to each last node (leaf) in 'leaves' as a side effect
  go_deeper(list(l))
  
  # now just make a data frame from the 'leaves' list
  len.max <- max(lengths(leaves))
  leaves <- sapply(leaves, function(x) c(x, rep(NA, len.max-length(x))))
  leaves <- as.data.frame(t(leaves))
  names(leaves) <- c('main_level', paste0('level', seq_len(ncol(leaves)-1)))
  
  leaves 
}
list_to_df(x)
#   main_level level1 level2 level3 level4
# 1          a      b   <NA>   <NA>   <NA>
# 2          a      c   <NA>   <NA>   <NA>
# 3          d      e   <NA>   <NA>   <NA>
# 4          d      f      g   <NA>   <NA>
# 5          d      f      h   <NA>   <NA>
# 6          i      j   <NA>   <NA>   <NA>
# 7          i      k      l      m   <NA>
# 8          i      k      l      n      o
# 9          i      k      l      n      p
  • Related