I am using a Decision Tree Classifier and in the data, the target column is 'TARGET' which consists of 0's and 1's
TARGET 0 282686 1 24825 dtype: int64
and after training on 0.75 of the whole data it is giving all the output as 0 and the accuracy_score for training, validation, test set is >0.90.
CodePudding user response:
Evaluation:
The test set should produce accuracy less than your training set. The training set is said to be trained with everything it knows and test doesn't know the patterns in training data. The simple evaluation method is to find train and test accuracy and compare them.
Results
- if
test accuracy > train accuracy
there is a problem check everything. The problem would mostly fall on the train test split. Using stratification method will do some good in this or try some other subset of data - if
train accuracy > test accuracy
then almost it is right, you can work on optimizing it.