y = df.pitch_name
y = np.array(y)
y = y.reshape(-1, 1)
from sklearn.preprocessing import OrdinalEncoder
ord_enc = OrdinalEncoder()
y = ord_enc.fit_transform(y.reshape(-1, 1))
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=12345
)
knn_model = KNeighborsRegressor(n_neighbors=3)
knn_model.fit(X_train, y_train)
knn_model.predict([X_test[0]])
X has all float value and y is all string type. if I use ordinalEncoder and predict with the model, it works but the issue is that the result I am getting is sometime not a whole number (e.g. 6.3333) when I want to get the exact category.
So whenever I fit the model with the raw categorical value, string, I see this error message TypeError: cannot perform reduce with flexible type. When I check error message, I suppose that the error is happening due to the 238 where they try to get y_pred = np.mean(_y[neigh_ind], axis=1) when it should be median since y is a list of string? any help will be appreciated.
237 if weights is None:
--> 238 y_pred = np.mean(_y[neigh_ind], axis=1)
239 else:
240 y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float64)
CodePudding user response:
Forgive me if I misunderstood something, but it sounds like you are trying to perform classification with a regression model: KNeighborsRegressor
.
From here you can see that:
Neighbors-based regression can be used in cases where the data labels are continuous rather than discrete variables. The label assigned to a query point is computed based on the mean of the labels of its nearest neighbors.
So, by using the OrdinalEncoder
, you just encoded the float categories, however, afterwards you predicted the mean of the labels of its nearest neighbors, which will not be an integer, and thus not a category.
I suggest that you read this, to learn how to use a KNeighborsClassifier
.