Home > Blockchain >  UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5
UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5

Time:11-15

I'm trying to use Grid Serach for Random Forest on a data frame. The code is below:

# Standardization
x=df.iloc[:,:-1]
y=df.iloc[:,-1]
x_cols=x.columns
# Splitting the dataset into the Training set and Test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

x = StandardScaler().fit_transform(x)
print(pd.DataFrame(x).head())

# Random Forest 
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier(random_state=42)
param_grid = { 'n_estimators':[100,200,300],'min_samples_split':[2,3,4,5],'max_depth':[4,5,6],
              'criterion':['gini', 'entropy']}
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(x, y)

print(CV_rfc.best_params_)

It's giving me the following error:

UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. % (min_groups, self.n_splits)), UserWarning)

Can anyone please help me to resolve the error so that I could get right paraments for Randon Forest?

CodePudding user response:

According to the GridSearchCV documentation:

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used.

Since you asked for 5 splits, this means that all classes represented in y need to be represented at least 5 times for them to exist in all splits. If you do not want to use stratified cross-validation, you can use cv=KFold(5) instead, which will create 5 groups without stratification.

Here is an example of the use of KFold splitting in GridSearchCV, from the Scikit Learn documentation.

  • Related