I'm doing a simple machine learning project. At initial model, my model was over fitting, as I understood by googling and learning about what over fitting is and how to detect it. Then I used SMOTE to reduce over fitting and tried to find if it still over fits. I'm getting a graph that I'm unable to interpret and tried several links to understand what is happening but failed. Can anyone please tell me if this graph is okay or there is something wrong in it? (The picture and code is given below)
def EF_final(x_train, y_train, x_test, y_test):
train_scores, test_scores = [], []
values = [i for i in range(1, 21)]
# evaluate a decision tree for each depth
for i in values:
# configure the model
model_ef = ExtraTreesClassifier(n_estimators = 80, random_state=42, min_samples_split = 2, min_samples_leaf= 1, max_features = 'sqrt', max_depth= 24, bootstrap=False)
# fit model on the training dataset
model_ef.fit(x_train, y_train)
# evaluate on the train dataset
train_yhat = model_ef.predict(x_train)
train_acc = accuracy_score(y_train, train_yhat)
train_scores.append(train_acc)
# evaluate on the test dataset
test_yhat = model_ef.predict(x_test)
test_acc = accuracy_score(y_test, test_yhat)
test_scores.append(test_acc)
# summarize progress
print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc))
# plot of train and test scores vs tree depth
plt.plot(values, train_scores, '-o', label='Train')
plt.plot(values, test_scores, '-o', label='Test')
plt.legend()
plt.show()
CodePudding user response:
Cant comment on results of your model prediction without viewing the data, but to answer your title question.
You seem to configure and create the same model in each loop without using the variable i
to change model depth . Even the random_state of the model is constant hence you can expect same result .
Consider switching the model configuration line to
model_ef = ExtraTreesClassifier(n_estimators = 80,min_samples_split = 2, min_samples_leaf= 1, max_features = 'sqrt', max_depth = i, bootstrap=False)
This will change the graph result to help u choose a better model, Accuracy can not be commented on however without knowing what kind of data is being passed.