What will happen if same hyperparameter got defined both in pipeline and GridsearchCV-CodePudding

I have defined the hyperparameter of a model in my pipeline and the algorithm might use GridsearchCV on execution and if it does so what will happen to the hyperparameter defined previously in the pipeline

Pipeline

pipe_nusvc = Pipeline([('clfnu', NuSVC(nu=0.5,kernel='rbf'))])

I can add Parameters using

pipe_nusvc['clfnu'].cache_size=300

And i thought about removing the added cache_size from the pipeline If GridsearchCV is used

If GridsearchCV is used the cache_size=300 defined in the pipeline will still remain there and there is no solution to remove it during running, so what will happen if same hyperparameter is being defined both in pipeline and GridsearchCV

CodePudding user response：

You cannot "remove" cache_size; it is a parameter of the NuSVC object. You can reset it to the default, but 200 is already the default.

If you set a value for a parameter (whether at initialization, using set_params, or manually as you have written), then that overrides the default, and it will stay that way until you change it.

sklearn has a utility called "clone": a new estimator of the same type is returned, with the same parameters set, but with none of the fitted/learned attributes.

In a grid search, the estimator (pipeline in this case) is cloned, and each clone gets assigned one of the parameter combinations (overriding anything you may have set before). So if you set cache_size in the definition of the pipeline, or later as in your code, but you also set cache_size in the grid search, only the grid values will matter. The original pipeline, outside the grid search, will continue to exist independently and have your set cache_size, but the search's cv_results_ and best_estimator_ won't. On the other hand, if you don't have cache_size in your parameter grid, then whatever value you set beforehand will be used for every parameter combination in the search.