I am trying to obtain the number of occurrences a decision tree path is used to classify an instance.
For example, suppose I have the following rules (not sure if they make sense):
- Rule 1: [x<3 and y<5 => 'Low']
- Rule 2: [x<3 and x>1 and y<5 => 'Low']
- Rule 3: [x<3 and y>2 and y<5 => 'Low']
- Rule 4: [x<6 and y<8 => 'Medium']
- Rule 5: [x<10 and y<10 => 'High']
Now, suppose I have 10 test set samples. I want something like this given this test set and the above rules:
- Rule 1 has been used 2 times,
- Rule 2 has been used 2 times,
- Rule 3 has been used 1 times,
- Rule 4 has been used 3 times,
- and Rule 5 has been used 2 times
Does anyone know how to tackle this using Python please?
Thanks in advance for your help!
CodePudding user response:
If you're not familiar with it, I recommend using the sklearn
Python package and more precisely, the sklearn.tree.DecisionTreeClassifier
class. Here are the API Documentation and the user guide.
This page should help you solve your problem as it gives more detail about the decision process and how to retrieve the path used to classify a sample.
Sorry if this answer doesn't solve your problem right away but it should get you on the way :)
CodePudding user response:
Do you want something like this:
import random
x_num=[random.randint(1,11) for _ in range(10)]
y_num=[random.randint(1,11) for _ in range(10)]
def func(xn,yn):
rule_1=0
rule_2=0
for x,y in zip(xn,yn):
if x>2 and y<3:
rule_1 =1
elif x<4 and y>2:
rule_2 =1
return rule_1,rule_2
print(func(x_num,y_num))
?