I've been searching for a solution to this for the past few hours now. Relevant pandas documentation is unhelpful and this solution gives me the same error.
I am trying to order my dataframe using a categorical in the following manner:
metabolites_order = CategoricalDtype(['Header', 'Metabolite', 'Unknown'], ordered=True)
df2['Feature type'] = df2['Feature type'].astype(metabolites_order)
df2 = df2.sort_values('Feature type')
The "Feature type" column is populated with the categories correctly. This code runs perfectly in Jupyter Notebooks, but when I run it in Pycharm, I get the following error:
Traceback (most recent call last):
File "/Users/wasim.sandhu/Documents/MSDIALPostProcessor/postprocessor.py", line 138, in process_alignment_file
df2.loc[4] = list(df2.columns)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1635, in _setitem_with_indexer
self._setitem_with_indexer_split_path(indexer, value, name)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1700, in _setitem_with_indexer_split_path
self._setitem_single_column(loc, v, pi)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1813, in _setitem_single_column
ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 568, in setitem
return self.apply("setitem", indexer=indexer, value=value)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1846, in setitem
self.values[indexer] = value
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py", line 211, in __setitem__
value = self._validate_setitem_value(value)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/categorical.py", line 1898, in _validate_setitem_value
raise ValueError(
ValueError: Cannot setitem on a Categorical with a new category, set the categories first
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
What could be causing this? I believe that I've set the categories correctly...
CodePudding user response:
I'd suggest just mapping these categories to integers then sorting on that column instead.
categories = ['Header', 'Metabolite', 'Unknown']
feature_map = {categories[i]:i for i in range(len(categories))}
df['Feature Order'] = df['Feature Type'].map(feature_map)
df.sort_values('Feature Order')
CodePudding user response:
Figured it out literally minutes after I posted the question. The header column in this dataset is in the 5th row. I checked the "Feature type" column and "Feature type" is one of its values, which threw this error.
Solved by adding the column header name into the categories.
metabolites_order = CategoricalDtype(['Header', 'Feature type', 'Metabolite', 'Unknown'], ordered=True)