I have defined the hyperparameter of a model in my pipeline and the algorithm might use GridsearchCV on execution and if it does so what will happen to the hyperparameter defined previously in the pipeline
Pipeline
pipe_nusvc = Pipeline([('clfnu', NuSVC(nu=0.5,kernel='rbf'))])
I can add Parameters using
pipe_nusvc['clfnu'].cache_size=300
And i thought about removing the added cache_size from the pipeline If GridsearchCV is used
If GridsearchCV is used the cache_size=300 defined in the pipeline will still remain there and there is no solution to remove it during running, so what will happen if same hyperparameter is being defined both in pipeline and GridsearchCV
CodePudding user response:
You cannot "remove" cache_size
; it is a parameter of the NuSVC
object. You can reset it to the default, but 200 is already the default.
If you set a value for a parameter (whether at initialization, using set_params
, or manually as you have written), then that overrides the default, and it will stay that way until you change it.
sklearn
has a utility called "clone": a new estimator of the same type is returned, with the same parameters set, but with none of the fitted/learned attributes.
In a grid search, the estimator (pipeline in this case) is cloned, and each clone gets assigned one of the parameter combinations (overriding anything you may have set before). So if you set cache_size
in the definition of the pipeline, or later as in your code, but you also set cache_size
in the grid search, only the grid values will matter. The original pipeline, outside the grid search, will continue to exist independently and have your set cache_size
, but the search's cv_results_
and best_estimator_
won't. On the other hand, if you don't have cache_size
in your parameter grid, then whatever value you set beforehand will be used for every parameter combination in the search.