I am trying to use GridSearchCV to optimize a pipeline that does feature selection in the beginning and classification using KNN at the end. I have fitted the model using my data set but when I see the best parameters found by GridSearchCV, it only gives the best parameters for SelectKBest. I have no idea why it doesn't show the best parameters for KNN.
Here is my code.
Addition of KNN and SelectKbest
classifier = KNeighborsClassifier()
parameters = {"classify__n_neighbors": list(range(5,15)),
"classify__p":[1,2]}
sel = SelectKBest(f_classif)
param={'kbest__k': [10, 20 ,30 ,40 ,50]}
GridsearchCV with pipeline and parameter grid
model = GridSearchCV(Pipeline([('kbest',sel),('classify', classifier)]),
param_grid=[param,parameters], cv=10)
fitting the model
model.fit(X_new, y)
the result
print(model.best_params_)
{'kbest__k': 40}
CodePudding user response:
That's an incorrect way of merging dicts I believe. Try
param_grid={**param,**parameters}
or (Python 3.9 )
param_grid=param|parameters
CodePudding user response:
When param_grid
is a list, the disjoint union of the grids generated by each dictionary in the list is explored. So your search is over (1) the default k=10
selected features and every combination of classifier parameters, and separately (2) the default classifier parameters and each value of k
. That the best parameters just show k=40
means that having more features, even with default classifier, performed best. You can check your cv_results_
to verify.
As dx2-66 answers, merging the dictionaries will generate the full grid you probably are after. You could also just define a single dictionary from the start.