Which ML algorithm should I use for this dataset-CodePudding

I have a dataset let say data1,data2,data3... output or predictive data should be names of people based on the given dataset. I have a training dataset but not sure which ML algorithm to use. And the list of peoples name does not change.

CodePudding user response：

It sounds like you are doing a classification task, so preferably you should use a classification algorithm. The type of algorithm to use really depends on the quality and structure of your data and its decision boundaries. Typically, before one embarks on a classification task, you must identify your data's outliers, noise, class imbalances, missing values and other data quality issues. And from there, you should select a model that best suits your needs.

For example, if your model contains lots of outliers and missing values, a decision tree might be preferable. However, if you have a large class imbalance, anomaly detection may be better suited. If you decision boundary is linear, you could make use of support vector machines. While if you have non-linear decision boundaries you'll need to look into more complex models such as gaussian discriminative models, self-organizing maps, or neural networks.

In summary, it is entirely dependent on your data.

CodePudding user response：

So I have the model with Accuracy of 97% and I am using Ordinal Encoder to fit and transform the data to numeric value. Saving both objects to .pkl. I am using Random Forest Classifier.

Testing: I loaded both objects and then I have TestData.csv where the predictive data is empty but when I used ordinal encoder it says. I am using 13 features instead of 14. Which is true of course since it does not have the predictive values. How do I fix this.