Home > Mobile >  Problem with pandas.Series.cat.rename_categories - getting 'categories must be unique' err
Problem with pandas.Series.cat.rename_categories - getting 'categories must be unique' err


Am new to Python and working through some exercises.

I have a column in my data called 'sequels' (for books) with numbers 1 through to 8.

I want to make a new column called 'sequelcategory' which relabels the numbers - I want 1 to be renamed to 'Original' and anything else to be renamed to Sequel. The exercise suggests that I use "pd.Series.cat.rename_categories" to do this.

The first hurdle I overcame was beating an error that said I needed to have categorical data (it was initially int64), I did this with:

bookdata['sequels'] = bookdata['sequels'].astype('category')

That was all well and good. I think set to creating my new column:

bookdata["sequelcategory"] = bookdata["sequels"].cat.rename_categories({1: 'original', 2: 'sequel'})

The above works absolutely fine - the problem I am having is that I also want numbers 3 - 8 to also be relabelled 'sequel', meaning that the below:

bookdata["sequelcategory"] = bookdata["sequels"].cat.rename_categories({1: 'original', 2: 'sequel', 3: 'sequel', 4: 'sequel', 5: 'sequel', 6: 'sequel', 7: 'sequel', 8: 'sequel', })

...returns the error: ValueError: Categorical categories must be unique.

Anyone have some advice on the above? I know there are probably 101 other ways to do this, but I am being told I need to do it with pandas.Series.cat.rename_categories and can't for the life of me work it out.

Any help would be greatly appreciated!

CodePudding user response:

We could map them before setting them as category,

bookdata = pd.DataFrame({'book series': [1, 2, 3, 4, 5, 1, 1, 2, 6, 8]})
   book series
0            1
1            2
2            3
3            4
4            5
5            1
6            1
7            2
8            6
9            8
map_dict = {1: 'original', 2: 'sequel', 3: 'sequel', 4: 'sequel', 5: 'sequel', 6: 'sequel', 7: 'sequel', 8: 'sequel'}
bookdata['sequelcategory'] = bookdata['book series'].map(map_dict).astype('category')
   book series sequelcategory
0            1       original
1            2         sequel
2            3         sequel
3            4         sequel
4            5         sequel
5            1       original
6            1       original
7            2         sequel
8            6         sequel
9            8         sequel
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   book series     10 non-null     int64   
 1   sequelcategory  10 non-null     category
  • Related