Home > Blockchain >  Unlist lists of lists in R
Unlist lists of lists in R

Time:04-20

I have written a function that builds a decision tree. The output L is then a list of lists, where each "leaf" of the list is a subset of the data that reaches the corresponding leaf in the tree. For example, if the tree has 3 leaves, then the output of the function will be the nested list L with 3 data.frames

List of 2
 $ left :'data.frame':  5 obs. of  3 variables:
  ..$ X1: num [1:5] 0.884 0.875 1.175 1.053 0.858
  ..$ X2: num [1:5] 0.996 0.884 0.995 1.029 1.006
  ..$ y : num [1:5] 1 1 1 1 1
 $ right:List of 2
  ..$ left :'data.frame':   5 obs. of  3 variables:
  .. ..$ X1: num [1:5] 2.03 1.93 2.07 2.02 2.06
  .. ..$ X2: num [1:5] 1.98 1.95 1.85 2.14 2.11
  .. ..$ y : num [1:5] 3 3 3 3 3
  ..$ right:'data.frame':   5 obs. of  3 variables:
  .. ..$ X1: num [1:5] 2.93 2.92 3.02 2.84 2.95
  .. ..$ X2: num [1:5] 2.98 3.06 2.91 3.03 2.89
  .. ..$ y : num [1:5] 2 2 2 2 2

How can I combine these data.frames into one data.frame in general, i.e. how can I unlist this list?

CodePudding user response:

Here's a recursive approach. It applies a similar logic to what you would use as a human.

flat <- list()
finder <- function(l) {
    for (element in l) {
        if (inherits(element, "data.frame")) {
            flat <<- c(flat, list(element))
        } else {
            finder(element)
        }
    }
    return(flat)
}

Once you have run the above, you can call it with Reduce(rbind, finder(your_list))

I'm not sure how to approach it without having to use <<- so would love feedback from those more knowledgeable than myself.

CodePudding user response:

Assuming the maximum depth of the list is 2 (as shown in your example, you can bind all the rows of the nodes with a depth of 2 with bind_rows() from dplyr, iterating over all the top-level nodes with map_dfr() from purrr. In both cases you can use the .id argument to create a column to store the name of the leaf and branch (or whatever you want to call them).

tree <- list(
  "left" = data.frame(
    X1 = c(0.884, 0.875, 1.175, 1.053, 0.858),
    X2 = c(0.996, 0.884, 0.995, 1.029, 1.006),
    y = 1
  ),
  "right" = list(
    "left" = data.frame(
      X1 = c(2.03, 1.93, 2.07, 2.02, 2.06),
      X2 = c(1.98, 1.95, 1.85, 2.14, 2.11),
      y = 3
    ),
    "right" = data.frame(
      X1 = c(2.93, 2.92, 3.02, 2.84, 2.95),
      X2 = c(2.98, 3.06, 2.91, 3.03, 2.89),
      y = 2
    )
  )
)

purrr::map_dfr(
  tree, 
  function (x) {
    if (is.data.frame(x)) {
      x
    } else {
      dplyr::bind_rows(x, .id = "leaf")
    }
  }
  , 
  .id = "branch"
)
#>    branch    X1    X2 y  leaf
#> 1    left 0.884 0.996 1  <NA>
#> 2    left 0.875 0.884 1  <NA>
#> 3    left 1.175 0.995 1  <NA>
#> 4    left 1.053 1.029 1  <NA>
#> 5    left 0.858 1.006 1  <NA>
#> 6   right 2.030 1.980 3  left
#> 7   right 1.930 1.950 3  left
#> 8   right 2.070 1.850 3  left
#> 9   right 2.020 2.140 3  left
#> 10  right 2.060 2.110 3  left
#> 11  right 2.930 2.980 2 right
#> 12  right 2.920 3.060 2 right
#> 13  right 3.020 2.910 2 right
#> 14  right 2.840 3.030 2 right
#> 15  right 2.950 2.890 2 right

Created on 2022-04-19 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related