Home > database >  Dataset for a python application
Dataset for a python application

Time:11-29

I am working on an application to predict a disease from it's symptoms, I have some trouble making a dataset. If someone has a dataset on this, please link it to drive and share it here. Also I have a question on a good model for this(sklearn only). I am currently using decision tree classifier as my model for the project. Give suggestions if you have any. Thank you for reading.

CodePudding user response:

You can make your own from this csv template:

  • Sickness, Symptom1, Symptom2, Symptom4
  • Covid-19, Cough, Loss of taste, Fever, Chills
  • Common Cold, Sneezing, Cough, Runny Nose, Headache

ignore bullet points, just for formatting. then use pandas read csv to read the data. if u need more help @mention me

CodePudding user response:

I see that you are having trouble finding a dataset. I made a quick search, and i found this one in kaggle. It would require preprocessing, since many of the symptoms are nulls in the columns. Maybe you could make it so each column is a specific sympton, with values 1 (or 0) if the symptom is (or isn't) present. This would have the problem that the number of 0s would be very high. You can try that and see if it works.

You can also see another implementation with Random Forest in this link, with very different preprocessing. It is an advanced model of Decision Tree. However, the Decision Tree is more interpretable, if that is what you need.

  • Related