How to check the model prediction with original values in machine learning-CodePudding

I have trained my model using machine learning and want to check with original value. Is i am doing it right? As whenever I change the numbers in 'value' getting the same result.

X_train, X_test, y_train, y_test = train_test_split(normalize(df4), y, test_size=0.2, random_state=0)

rfc1=RandomForestClassifier( random_state=0, max_features='auto', n_estimators= 90, max_depth=8, criterion='gini' )
rfc1.fit(X_train, y_train)

value=[2.60,1.0,3.0,19.0,1.0,1.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0]
print(rfc1.predict([value]))

y_pred=rfc1.predict(X_test)

print(metrics.classification_report(y_test,y_pred))
df_cm = pd.DataFrame(confusion_matrix(y_test,y_pred), index = [i for i in ['Avg Marks','Good Marks','Bad Marks']],
                  columns = [i for i in ['Avg Marks','Good Marks','Bad Marks']])
plt.figure(figsize = (4,2))
sn.heatmap(df_cm, annot=True, fmt='g')

Model Accuracy is good but still getting always "Hight chances of Good Marks"

['High Chances of Good Marks']
                            precision    recall  f1-score   support

             Average Marks       0.80      0.83      0.82        59
 High Chances of Bad Marks       0.81      0.72      0.76        18
High Chances of Good Marks       0.86      0.86      0.86        50

                  accuracy                           0.83       127
                 macro avg       0.83      0.80      0.81       127
              weighted avg       0.83      0.83      0.83       127

Original data looks like this

CodePudding user response：

Got the half solution to my problem. I was normalizing the model before fit.

But the original data (value) is not normalize.

Below code works but but now my data which i am using for fit is not normalize. Any solution?

X_train, X_test, y_train, y_test = train_test_split(df4, y, test_size=0.2, random_state=0)

rfc1=RandomForestClassifier( random_state=0, max_features='auto', n_estimators= 90, max_depth=8, criterion='gini' )
rfc1.fit(X_train, y_train)

value=[2.60,1.0,3.0,19.0,1.0,1.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0]
print(rfc1.predict([value]))


y_pred=rfc1.predict(X_test)

print(metrics.classification_report(y_test,y_pred))
df_cm = pd.DataFrame(confusion_matrix(y_test,y_pred), index = [i for i in ['Avg Marks','Good Marks','Bad Marks']],
                  columns = [i for i in ['Avg Marks','Good Marks','Bad Marks']])
plt.figure(figsize = (4,2))
sn.heatmap(df_cm, annot=True, fmt='g')

CodePudding user response：

First of all, you need to define yourself what you want. Normalize data or not.

I recommend you Normalize it.

If you normalize input data, you need to train your data with normalized values; and, you need to evaluated predictions with also normalized data.

You always have to reproduce use the same structure of input data in predictions that you used in the training

Take a look to this link enter link description here to refer the normalization.