Home > OS >  Using Ordinal Variables as categories in XGBoost Python
Using Ordinal Variables as categories in XGBoost Python

Time:11-18

I am trying to train a multi-class classifier using XGBoost. Data contains 4 independent variables which are ordinal in nature. I want to use these variables as is because they are encoded. The data looks like below

Column name Values
target ['high', 'medium', 'low']
feature_1 Values ranging from 1-5
feature_2 Values ranging from 1-5
feature_3 Values ranging from 1-5
feature_4 Values ranging from 1-5

My code currently look like below

y = data['target']
X = data.drop(['target'], axis=1)

X = X.fillna(0)
X = X.astype('int').astype('category')

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=random_state, stratify=y)

# Create instance of model
xgb_model = XGBClassifier()

# Create the random grid
xgb_grid = {'n_estimators': [int(x) for x in np.linspace(start = 100, stop = 500, num = 5)],
            'max_depth': [3, 5, 8, 10],
            'learning_rate': [0.01, 0.05, 0.1, 0.2, 0.3]}

xgb_model_tuned = RandomizedSearchCV(estimator = xgb_model, param_distributions = xgb_grid, n_iter = 50, cv = 5, scoring='roc_auc', verbose=2, random_state=random_state, n_jobs = -1)

# Pass training data into model
xgb_model_tuned.fit(x_train, y_train)

I get the following error when i run this

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical.  When
                categorical type is supplied, DMatrix parameter
                `enable_categorical` must be set to `True`.feature_1, feature_2, 
                feature_3, feature_4

The dtype is category for all the variables. This worked well with RandomForest Classifier but not with XGBoost. If i cannot use the datatype category how can i pass the ordinal variables as categories?

CodePudding user response:

You are almost there!

Based on XGBoost Documentation, you need to set enable_categorical=True and the supported tree methods are gpu_hist, approx, and hist.

# Create instance of model
xgb_model = XGBClassifier(tree_method="gpu_hist", enable_categorical=True)

Also, ensure that your XGBoost version is 1.5 and above.

CodePudding user response:

If you want them treated as ordinal, then just make the column type int: xgboost will make splits as though they were continuous, which preserves the ordered nature.

  • Related