I have a data frame with the following columns dtype
{Int64Dtype(), UInt8Dtype(), dtype('float64'), dtype('int64')}
when I'm trying to fit xgb.XGBClassifier() I'm getting following error
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns: NAME OF COLS THAT ARE UINT TYPE
CodePudding user response:
Here's the code which triggers the warning:
def _invalid_dataframe_dtype(data: DataType) -> None:
# pandas series has `dtypes` but it's just a single object
# cudf series doesn't have `dtypes`.
if hasattr(data, "dtypes") and hasattr(data.dtypes, "__iter__"):
bad_fields = [
str(data.columns[i])
for i, dtype in enumerate(data.dtypes)
if dtype.name not in _pandas_dtype_mapper
]
err = " Invalid columns:" ", ".join(bad_fields)
else:
err = ""
type_err = "DataFrame.dtypes for data must be int, float, bool or category."
msg = f"""{type_err} {_ENABLE_CAT_ERR} {err}"""
raise ValueError(msg)
(Source.)
It references another variable, _pandas_dtype_mapper
, which is used to decide how to match each datatype. Here's how that is defined:
_pandas_dtype_mapper = {
'int8': 'int',
'int16': 'int',
'int32': 'int',
'int64': 'int',
'uint8': 'int',
'uint16': 'int',
'uint32': 'int',
'uint64': 'int',
'float16': 'float',
'float32': 'float',
'float64': 'float',
'bool': 'i',
# nullable types
"Int16": "int",
"Int32": "int",
"Int64": "int",
"boolean": "i",
}
(Source.)
So, here we find the problem. It supports a uint datatype. It supports a nullable datatype. But it doesn't seem to support a nullable uint datatype.
This suggests two possible workarounds:
- Use int instead of uint.
- Fill in your null values in that column, and convert that column to a non-nullable datatype.