Home > Enterprise >  Getting categorical related error when trying to fit XGBoost model when there are no categorical col
Getting categorical related error when trying to fit XGBoost model when there are no categorical col

Time:07-12

I have a data frame with the following columns dtype

{Int64Dtype(), UInt8Dtype(), dtype('float64'), dtype('int64')}

when I'm trying to fit xgb.XGBClassifier() I'm getting following error

ValueError: DataFrame.dtypes for data must be int, float, bool or category.  When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns: NAME OF COLS THAT ARE UINT TYPE

CodePudding user response:

Here's the code which triggers the warning:

def _invalid_dataframe_dtype(data: DataType) -> None:
    # pandas series has `dtypes` but it's just a single object
    # cudf series doesn't have `dtypes`.
    if hasattr(data, "dtypes") and hasattr(data.dtypes, "__iter__"):
        bad_fields = [
            str(data.columns[i])
            for i, dtype in enumerate(data.dtypes)
            if dtype.name not in _pandas_dtype_mapper
        ]
        err = " Invalid columns:"   ", ".join(bad_fields)
    else:
        err = ""

    type_err = "DataFrame.dtypes for data must be int, float, bool or category."
    msg = f"""{type_err} {_ENABLE_CAT_ERR} {err}"""
    raise ValueError(msg)

(Source.)

It references another variable, _pandas_dtype_mapper, which is used to decide how to match each datatype. Here's how that is defined:

_pandas_dtype_mapper = {
    'int8': 'int',
    'int16': 'int',
    'int32': 'int',
    'int64': 'int',
    'uint8': 'int',
    'uint16': 'int',
    'uint32': 'int',
    'uint64': 'int',
    'float16': 'float',
    'float32': 'float',
    'float64': 'float',
    'bool': 'i',
    # nullable types
    "Int16": "int",
    "Int32": "int",
    "Int64": "int",
    "boolean": "i",
}

(Source.)

So, here we find the problem. It supports a uint datatype. It supports a nullable datatype. But it doesn't seem to support a nullable uint datatype.

This suggests two possible workarounds:

  1. Use int instead of uint.
  2. Fill in your null values in that column, and convert that column to a non-nullable datatype.
  • Related