I'm training a classification model using the Pima Indians diabetes dataset. Here is a part of my code:
# Split into X and y
X = diabetes_data.drop(columns="Outcome")
y = diabetes_data["Outcome"]
# Transfrom the columns using StandardScaler:
st_scaler = StandardScaler()
cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age']
transformer = ColumnTransformer([("st_scaler", st_scaler, cols)])
X_transformed = transformer.fit_transform(X)
# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.3)
# Train the model
knn_clf = KNeighborsClassifier().fit(X_train, y_train)
X_test.shape # (231, 8), It has 8 columns, the same as X_train.
The problem is that when I try to visualize the decision boundary of my model, I get an error suggesting that the number of features between my train and test data doesn't match.
DecisionBoundaryDisplay.from_estimator(knn_clf, X_test, response_method="predict")
The error message:
[Error] ValueError: X has 2 features, but KNeighborsClassifier is expecting 8 features as input.
CodePudding user response:
DecisionBoundaryDisplay is a 2D graph so you can't use it with your model trained on 8 dimensional data.