Pandas – ValueError: Cannot setitem on a Categorical with a new category, set the categories first-CodePudding

I've been searching for a solution to this for the past few hours now. Relevant pandas documentation is unhelpful and this solution gives me the same error.

I am trying to order my dataframe using a categorical in the following manner:

metabolites_order = CategoricalDtype(['Header', 'Metabolite', 'Unknown'], ordered=True)
df2['Feature type'] = df2['Feature type'].astype(metabolites_order)
df2 = df2.sort_values('Feature type')

The "Feature type" column is populated with the categories correctly. This code runs perfectly in Jupyter Notebooks, but when I run it in Pycharm, I get the following error:

Traceback (most recent call last):
  File "/Users/wasim.sandhu/Documents/MSDIALPostProcessor/postprocessor.py", line 138, in process_alignment_file
    df2.loc[4] = list(df2.columns)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1635, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1700, in _setitem_with_indexer_split_path
    self._setitem_single_column(loc, v, pi)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1813, in _setitem_single_column
    ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 568, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1846, in setitem
    self.values[indexer] = value
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py", line 211, in __setitem__
    value = self._validate_setitem_value(value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/categorical.py", line 1898, in _validate_setitem_value
    raise ValueError(
ValueError: Cannot setitem on a Categorical with a new category, set the categories first

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

What could be causing this? I believe that I've set the categories correctly...

CodePudding user response：

I'd suggest just mapping these categories to integers then sorting on that column instead.

categories = ['Header', 'Metabolite', 'Unknown']
feature_map = {categories[i]:i for i in range(len(categories))}
df['Feature Order'] = df['Feature Type'].map(feature_map)
df.sort_values('Feature Order')

CodePudding user response：

Figured it out literally minutes after I posted the question. The header column in this dataset is in the 5th row. I checked the "Feature type" column and "Feature type" is one of its values, which threw this error.

Solved by adding the column header name into the categories.

metabolites_order = CategoricalDtype(['Header', 'Feature type', 'Metabolite', 'Unknown'], ordered=True)