Home > other >  Comparison between ordered categorical type in Pandas not working as expected
Comparison between ordered categorical type in Pandas not working as expected

Time:12-27

The following code:

s2 = pd.Series(['m','l','s','xl','xs'])

size_type = pd.api.types.CategoricalDtype(categories =['xs','s','m','l','xl'], ordered = True)

s3 = s2.astype(size_type)

print(s3)

Yelds this result:

0     m
1     l
2     s
3    xl
4    xs
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']

So I expect that the "m" type would be bigger than the "s" type, acoording to the order that I set when I created the category. But when I check this in a comparison, the result is the opposite:

s3[0] > s3[2]

Yelds this result:

False

Why is this happening?

CodePudding user response:

s3[0] and s3[2] return strings, which are not ordered by category code, you can use .cat.codes to access the internally stored code for comparison:

s3.cat.codes[0] > s3.cat.codes[2]
# True

To see .cat.codes in detail:

s3.cat.codes
#0    2
#1    3
#2    1
#3    4
#4    0
#dtype: int8

s3.cat.codes[0]
#2

s3.cat.codes[2]
#1
  • Related