Home > Software design >  Why is the model getting 100% accuracy for SVM, Random-forest Classifier and Logistic Regression?
Why is the model getting 100% accuracy for SVM, Random-forest Classifier and Logistic Regression?

Time:05-17

I'm using an existing disease prediction model to build a chatbot. While I was referring to the model I realized that it has an accuracy of 100%. I'm not quite sure how and why the accuracy is 100%. I've attached herewith the link to the code I'm referring to - Screenshot of accuracy and output

CodePudding user response:

That you have 100% train and test accuracy probably means that your model is massively overfitting because of your amount of data.

But in general you should avoid overfitting as well as underfitting because both damage your performance of machine learning algorithms.

CodePudding user response:

overfitting occurs in your code I think !

CodePudding user response:

This is not necessarily an error and this wouldn't be the first time someone acquired 0 accuracy on such a dataset either.*

Still, I would recommend using both training and test sets instead of using training.csv only. Have a look at this example to see how there is actually two standard splits given as a part of the benchmark dataset.

train_df = pd.read_csv('/kaggle/input/disease-prediction-using-machine-learning/Training.csv')
test_df = pd.read_csv('/kaggle/input/disease-prediction-using-machine-learning/Testing.csv')

  • Judging by the other attempts of training a classifier for the given dataset
  • Related