Home > front end >  Feature selection with GridsearchCV
Feature selection with GridsearchCV

Time:07-28

I am trying to use GridSearchCV to optimize a pipeline that does feature selection in the beginning and classification using KNN at the end. I have fitted the model using my data set but when I see the best parameters found by GridSearchCV, it only gives the best parameters for SelectKBest. I have no idea why it doesn't show the best parameters for KNN.

Here is my code.

Addition of KNN and SelectKbest

classifier = KNeighborsClassifier()
parameters = {"classify__n_neighbors": list(range(5,15)),
                           "classify__p":[1,2]}
sel = SelectKBest(f_classif)
param={'kbest__k': [10, 20 ,30 ,40 ,50]}

GridsearchCV with pipeline and parameter grid

model = GridSearchCV(Pipeline([('kbest',sel),('classify', classifier)]),
                     param_grid=[param,parameters], cv=10)

fitting the model

model.fit(X_new, y)

the result

print(model.best_params_)
{'kbest__k': 40}

CodePudding user response:

That's an incorrect way of merging dicts I believe. Try

param_grid={**param,**parameters}

or (Python 3.9 )

param_grid=param|parameters

CodePudding user response:

When param_grid is a list, the disjoint union of the grids generated by each dictionary in the list is explored. So your search is over (1) the default k=10 selected features and every combination of classifier parameters, and separately (2) the default classifier parameters and each value of k. That the best parameters just show k=40 means that having more features, even with default classifier, performed best. You can check your cv_results_ to verify.

As dx2-66 answers, merging the dictionaries will generate the full grid you probably are after. You could also just define a single dictionary from the start.

  • Related