I am reading the book Hands-on Machine Learning by Aurélien Géron, and in the second chapter at page 142 he wrote the following code about hyperparameter tuning combinations:
param_grid = [
{'preprocessing__geo__n_clusters': [5, 8, 10],
'random_forest__max_features': [4, 6, 8]},
{'preprocessing__geo__n_clusters': [10, 15],
'random_forest__max_features': [6, 8, 10]},
]
I think there are repetitive combinations, or am I missing something?
CodePudding user response:
Yes, this grid contains duplicates.
You can check by enumerating them:
from sklearn.model_selection import ParameterGrid
param_grid = [
{"preprocessing__geo__n_clusters": [5, 8, 10],
"random_forest__max_features": [4, 6, 8]},
{"preprocessing__geo__n_clusters": [10, 15],
"random_forest__max_features": [6, 8, 10]},
]
for params in ParameterGrid(param_grid=param_grid):
print(params)
{'preprocessing__geo__n_clusters': 5, 'random_forest__max_features': 4}
...
{'preprocessing__geo__n_clusters': 10, 'random_forest__max_features': 6}
{'preprocessing__geo__n_clusters': 10, 'random_forest__max_features': 8}
{'preprocessing__geo__n_clusters': 10, 'random_forest__max_features': 6}
{'preprocessing__geo__n_clusters': 10, 'random_forest__max_features': 8}
...
{'preprocessing__geo__n_clusters': 15, 'random_forest__max_features': 10}
CodePudding user response:
(Alternate answer for people reading the Second Edition).
I think this was an error which was corrected in the 2nd Edition of Aurélien Géron's "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow."
The 2nd Edition describes Grid Search with cross validation on p. 76 of Chapter 2, writing:
from sklearn.model_selection import GridSearchCV
param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5, scoring='neg_mean_squared_error', return_train_score=True)
grid_search.fit(housing_prepared, housing_labels)
Since the bootstrap=True
is the default, the updated param_grid
does not have this issue:
{'max_features': 2, 'n_estimators': 3}
{'max_features': 2, 'n_estimators': 10}
...
{'bootstrap': False, 'max_features': 4, 'n_estimators': 3}
{'bootstrap': False, 'max_features': 4, 'n_estimators': 10}