What exactly is n_iter hyperparameter in randomizedSearch?-CodePudding

I am trying to wrap my head around the n_iter parameter when using randomizedSearch for tuning hyperparameters of xgbRegressor model.

Specifically, how does it work with the cv parameter?

Here's the code:

# parameter distributions

params = {
          "colsample_bytree": uniform(0.7, 0.3), # fraction of cols to sample
          "gamma": uniform(0, 0.5), # min loss reduction required for next split
          "learning_rate": uniform(0.03, 0.3), # default 0.1 
          "max_depth": randint(2, 6), # default 6, controls model complexity and overfitting
          "n_estimators": randint(100, 150), # default 100
          "subsample": uniform(0.6, 0.4) # % of rows to use in training sample
}

rsearch = RandomizedSearchCV(model, param_distributions=params, random_state=42, n_iter=200, cv=3, verbose=1, n_jobs=1, return_train_score=True)

# Fit model
rsearch.fit(X_train, y_train)

Fitting 3 folds for each of 200 candidates, totalling 600 fits

The documentation says it is the number of parameter settings. And the output log refers of n_iter as candidates. What exactly does that mean?

CodePudding user response：

This simply determines how many runs in total your randomized search will try.

Remember, this is not grid search; in parameters, you give what distributions your parameters will be sampled from. But you need one more setting to tell the function how many runs it will try in total, before concluding the search; and this setting is n_iter - that's why, at the end (results), the function reports that n_iter candidate solutions (i.e. specific parameter settings) were tried.

There is not any direct relation between n_iter and the cv parameter; the latter determines how exactly the performance of each iteration (candidate solution) will be determined.