I am trying to wrap my head around the n_iter
parameter when using randomizedSearch for tuning hyperparameters of xgbRegressor model.
Specifically, how does it work with the cv
parameter?
Here's the code:
# parameter distributions
params = {
"colsample_bytree": uniform(0.7, 0.3), # fraction of cols to sample
"gamma": uniform(0, 0.5), # min loss reduction required for next split
"learning_rate": uniform(0.03, 0.3), # default 0.1
"max_depth": randint(2, 6), # default 6, controls model complexity and overfitting
"n_estimators": randint(100, 150), # default 100
"subsample": uniform(0.6, 0.4) # % of rows to use in training sample
}
rsearch = RandomizedSearchCV(model, param_distributions=params, random_state=42, n_iter=200, cv=3, verbose=1, n_jobs=1, return_train_score=True)
# Fit model
rsearch.fit(X_train, y_train)
Fitting 3 folds for each of 200 candidates, totalling 600 fits
The documentation says it is the number of parameter settings. And the output log refers of n_iter
as candidates. What exactly does that mean?
CodePudding user response:
This simply determines how many runs in total your randomized search will try.
Remember, this is not grid search; in parameters
, you give what distributions your parameters will be sampled from. But you need one more setting to tell the function how many runs it will try in total, before concluding the search; and this setting is n_iter
- that's why, at the end (results), the function reports that n_iter
candidate solutions (i.e. specific parameter settings) were tried.
There is not any direct relation between n_iter
and the cv
parameter; the latter determines how exactly the performance of each iteration (candidate solution) will be determined.