I'm trying to do something about data science by watching videos on youtube. My current goal is to perform classification with a data. In this data file, There is a binary variable that indicates whether the plane is delayed or not('DELAY'). The name of the airline to which the plane belongs is in string type('CARRIER') and the name of the airport from which the plane took off is in string type('AIRPORT'). So, how can I use the classification model named random forest with my own data?
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
I made x_train, y_train but it gives error. Probably airport and carrier names are in string type.
CodePudding user response:
your data set needs to be pre-processed. Basically, all the strings values/features in your data file need to be made into numbers. Look into Ordinal Encoding and One-Hot Encoding.
CodePudding user response:
You need to drop the both columns because machine learning doesn't work with strings.