I have a unbalanced dataset. So I got very poor performance when using classifier. It is a binary class problem and I am using Random forest
as a classifier. The ratio of True negative
with True positive
is 7:1. So I tried to fix the problem and used Subset Evaluator
with Random Forest
and used BestFirst search
to find out important attributes. Then I used only the important attributes in my dataset and the class attribute and discarded all other attribute. Then I again performed Random Forest
on the dataset. Now it gives even more poor performance. The True negative
and true positive
ration is like 12:1. I am using Weka for the entire process.
I would like to know does attribute evaluator work for unbalanced dataset?
Thank you.
CodePudding user response:
If a subset of attributes highly correlates with the majority class label, then it is not surprising that this will acerbate the imbalance. After all, you are removing the attributes that correlate with the minority class label(s).