I need to plot a conditional inference tree. I have selected the party::ctree() function. It works on the iris dataset.
library(party)
(irisct_party <- party::ctree(Species ~ .,data = iris))
plot(irisct_party)
But when I using the random data
library(wakefield)
set.seed(123)
n=200
studs <- data.frame(problem = factor(answer(n, x = c("No", "Yes"))),
age = round(runif(n, 18, 25)),
gender = factor(answer(n, x = c("M", "F" ))),
smoker = factor(answer(n, x = c("No", "Yes" ))),
before = round(runif(n, 60, 80)),
after = before round(runif(n, 10, 20))
)
(ct <- party::ctree(problem ~ ., data = studs))
plot(ct)
I see just
Conditional inference tree with 1 terminal nodes
Response: problem
Inputs: age, gender, smoker, before, after
Number of observations: 200
1)* weights = 200
Question. Why is the conditional inference tree has 1 terminal node on random data?
CodePudding user response:
In each node (including the root node), ctree()
conducts an independence test for the dependent variable (problem
in your random data) and each of the explanatory variables (age
, gender
, smoker
, before
, after
). It computes the p-value for each of of the tests and selects the explanatory variable with the lowest p-value for splitting. But only if that p-value is significant at a certain significance level (adjusted for testing multiple explanatory variables). In your data this is not the case because, in fact, the dependent variable has been sampled independently from the explanatory ones. Therefore, the algorithm stops and does not split the root node.
Remarks: It is recommended to use the successor package partykit
rather than party
for fitting ctree()
. See also the accompanying vignette("ctree", package = "partykit")
for further details.