Suppose that I want to create in R
a binary tree on the interval (0,1)
with maximum depth 3
in the following way:
First we have a pool of potential cut-offs for the tree t=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7)
, a cut-off means that if we randomly choose the value 0.4
then we split the interval (0,1)
to (0,0.4)
and (0.4,1)
.
The steps that I want to do are:
1)Start with the whole interval (0,1)
2)Randomly choose a cut-off from t
, denoted as t_1
3)Split the interval (0,1)
based on the chosen cut-off i.e. to subintervals (0,t_1)
and (t_1,1)
4)Then randomly choose one of the intervals (0,t_1)
and (t_1,1)
5)For the chosen interval randomly sample from the cut-offs a point t_2
that makes sense, i.e. a point that it is not outside of the interval
6)Continue the procedure up to the point where we reach the maximum depth.
I'm totally clueless on where how to start. Is this the right forum to post such a question?
CodePudding user response:
Creating a tree structure like this requires a recursive function (i.e. a function that calls itself). The following function creates a list of nodes, where each branch node contains a split
value, and two daughter nodes called left
and right
. The leaf nodes contains the final range encompassed within the leaf.
make_node <- function(min = 0, max = 1, desired_depth = 3, depth = 0) {
if (depth < desired_depth) {
split <- runif(1, min, max)
list(split = split,
left = make_node(min = min, max = split, depth = depth 1),
right = make_node(min = split, max = max, depth = depth 1))
} else {
list(range = c(min, max))
}
}
It works like this. Let's create a reproducible tree:
set.seed(1)
tree <- make_node()
To get the initial splitting value, we do:
tree$split
#> [1] 0.2655087
So the right branch deals with all values between 0.2655087 and 1. To see where it splits this range, we do
tree$right$split
#> [1] 0.4136423
So this branch splits into values between [0.2655087, 0.4136423] on the left and [0.4136423, 1] on the right. Let's examine the left node:
tree$right$left$split
#> [1] 0.3985904
This has now split the [0.2655087, 0.4136423] branch into a left [0.2655087, 0.3985904] branch and a right [0.3985904, 0.4136423] branch.
If we take this right branch, have now reached depth 3, so we get the final range of this leaf and confirm its range:
tree$right$left$right
#> $range
#> [1] 0.3985904 0.4136423
Of course, to make all this easier you probably want some kind of function to walk the tree to classify a particular number.
walk_tree <- function(value, tree) {
result <- paste("Value:", value, "\n")
while(is.null(tree$range)) {
if(value >= tree$split) {
result <- paste(result, "\nGreater than split of", tree$split)
tree <- tree$right
} else {
result <- paste(result, "\nLess than split of", tree$split)
tree <- tree$left
}
}
result <- paste0(result, "\nValue falls into leaf node with range [",
tree$range[1], ",", tree$range[2], "]\n")
cat(result)
}
So, for example, we get
walk_tree(value = 0.4, tree)
#> Value: 0.4
#>
#> Greater than split of 0.2655086631421
#> Less than split of 0.413642294289884
#> Greater than split of 0.398590389362078
#> Value falls into leaf node with range [0.398590389362078,0.413642294289884]
You may prefer that this function returns a vector of 0s and 1s, or you may be looking for it to draw the tree, which is trickier to do, but possible.
Created on 2022-03-09 by the reprex package (v2.0.1)
CodePudding user response:
Perhaps we can use Reduce
to generate intervals in the binary-tree manner
Reduce(
function(interval, k) {
lb <- min(interval)
ub <- max(interval)
x <- v[v > lb & v < ub]
if (!length(x)) {
return(c(NA, NA))
}
p <- sample(x, 1)
list(c(lb, p), c(p, ub))[[sample(1:2, 1)]]
},
1:3,
init = c(0, 1),
accumulate = TRUE
)
and you will see the result like
[[1]]
[1] 0 1
[[2]]
[1] 0.0 0.6
[[3]]
[1] 0.0 0.2
[[4]]
[1] 0.0 0.1
which indicates the selected interval in each iteration from top to bottom.