I have written a function that builds a decision tree. The output L
is then a list of lists, where each "leaf" of the list is a subset of the data that reaches the corresponding leaf in the tree. For example, if the tree has 3 leaves, then the output of the function will be the nested list L
with 3 data.frames
List of 2
$ left :'data.frame': 5 obs. of 3 variables:
..$ X1: num [1:5] 0.884 0.875 1.175 1.053 0.858
..$ X2: num [1:5] 0.996 0.884 0.995 1.029 1.006
..$ y : num [1:5] 1 1 1 1 1
$ right:List of 2
..$ left :'data.frame': 5 obs. of 3 variables:
.. ..$ X1: num [1:5] 2.03 1.93 2.07 2.02 2.06
.. ..$ X2: num [1:5] 1.98 1.95 1.85 2.14 2.11
.. ..$ y : num [1:5] 3 3 3 3 3
..$ right:'data.frame': 5 obs. of 3 variables:
.. ..$ X1: num [1:5] 2.93 2.92 3.02 2.84 2.95
.. ..$ X2: num [1:5] 2.98 3.06 2.91 3.03 2.89
.. ..$ y : num [1:5] 2 2 2 2 2
How can I combine these data.frames into one data.frame in general, i.e. how can I unlist this list?
CodePudding user response:
Here's a recursive approach. It applies a similar logic to what you would use as a human.
flat <- list()
finder <- function(l) {
for (element in l) {
if (inherits(element, "data.frame")) {
flat <<- c(flat, list(element))
} else {
finder(element)
}
}
return(flat)
}
Once you have run the above, you can call it with Reduce(rbind, finder(your_list))
I'm not sure how to approach it without having to use <<-
so would love feedback from those more knowledgeable than myself.
CodePudding user response:
Assuming the maximum depth of the list is 2 (as shown in your example, you can bind all the rows of the nodes with a depth of 2 with bind_rows()
from dplyr
, iterating over all the top-level nodes with map_dfr()
from purrr
. In both cases you can use the .id
argument to create a column to store the name of the leaf and branch (or whatever you want to call them).
tree <- list(
"left" = data.frame(
X1 = c(0.884, 0.875, 1.175, 1.053, 0.858),
X2 = c(0.996, 0.884, 0.995, 1.029, 1.006),
y = 1
),
"right" = list(
"left" = data.frame(
X1 = c(2.03, 1.93, 2.07, 2.02, 2.06),
X2 = c(1.98, 1.95, 1.85, 2.14, 2.11),
y = 3
),
"right" = data.frame(
X1 = c(2.93, 2.92, 3.02, 2.84, 2.95),
X2 = c(2.98, 3.06, 2.91, 3.03, 2.89),
y = 2
)
)
)
purrr::map_dfr(
tree,
function (x) {
if (is.data.frame(x)) {
x
} else {
dplyr::bind_rows(x, .id = "leaf")
}
}
,
.id = "branch"
)
#> branch X1 X2 y leaf
#> 1 left 0.884 0.996 1 <NA>
#> 2 left 0.875 0.884 1 <NA>
#> 3 left 1.175 0.995 1 <NA>
#> 4 left 1.053 1.029 1 <NA>
#> 5 left 0.858 1.006 1 <NA>
#> 6 right 2.030 1.980 3 left
#> 7 right 1.930 1.950 3 left
#> 8 right 2.070 1.850 3 left
#> 9 right 2.020 2.140 3 left
#> 10 right 2.060 2.110 3 left
#> 11 right 2.930 2.980 2 right
#> 12 right 2.920 3.060 2 right
#> 13 right 3.020 2.910 2 right
#> 14 right 2.840 3.030 2 right
#> 15 right 2.950 2.890 2 right
Created on 2022-04-19 by the reprex package (v2.0.1)