I ran the following code to fit a random forest model. I used a Kaggle data set:
Data link: https://www.kaggle.com/arnavr10880/winedataset-eda-ml/data?select=WineQT.csv
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold,cross_val_score,GridSearchCV
from sklearn import linear_model
from sklearn.ensemble import RandomForestRegressor
import numpy as np
data= pd.read_csv("C:/Users/Downloads/Model Test Data.csv")
y=data.loc[: ,["y"]]
x=data.iloc[:,1:]
x_train, x_test,y_train, y_test = train_test_split(x,y)
rf=RandomForestRegressor()
params = {
'n_estimators' : [300,500],
'max_depth' : np.array([8,9,12]),
'random_state' : [0],
}
scoring = ["neg_mean_absolute_error","neg_mean_squared_error"]
for score in scoring:
print("score %s" % scoring)
clf= GridSearchCV(rf,param_grid=params,scoring="%s" %score,verbose=False)
clf.fit(x_train,y_train)
print("Best parameters:")
print(clf.best_params_)
means=clf.cv_results_["mean_test_score"]
stds=clf.cv_results_["std_test_score"]
for mean,sd,params in zip(means,stds, clf.cv_results_["params"]):
print("%0.3f ( /-%0.3f) for %r" %(mean,2*sd,params) )
However, I got the following error:
"Parameter grid for parameter (max_depth) needs to be a list or numpy array,
but got (<class 'int'>). Single values need to be wrapped in a list with one element."
Could anyone help me to fix this?
Thank you.
CodePudding user response:
When you run your example, you see that the first score
in the for
loop prints just fine. After that, examining the params
variable shows
{'max_depth': 12, 'n_estimators': 500, 'random_state': 0}
so you've accidentally overwritten the params
space with a specific parameter combination.
Looking again at your code, it's in the print at the end of the loop:
for mean,sd,***params*** in zip(means,stds, clf.cv_results_["params"]):
print("%0.3f ( /-%0.3f) for %r" %(mean,2*sd,params) )
so just use a different variable here.