The following code:
s2 = pd.Series(['m','l','s','xl','xs'])
size_type = pd.api.types.CategoricalDtype(categories =['xs','s','m','l','xl'], ordered = True)
s3 = s2.astype(size_type)
print(s3)
Yelds this result:
0 m
1 l
2 s
3 xl
4 xs
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']
So I expect that the "m" type would be bigger than the "s" type, acoording to the order that I set when I created the category. But when I check this in a comparison, the result is the opposite:
s3[0] > s3[2]
Yelds this result:
False
Why is this happening?
CodePudding user response:
s3[0]
and s3[2]
return strings, which are not ordered by category code, you can use .cat.codes
to access the internally stored code for comparison:
s3.cat.codes[0] > s3.cat.codes[2]
# True
To see .cat.codes
in detail:
s3.cat.codes
#0 2
#1 3
#2 1
#3 4
#4 0
#dtype: int8
s3.cat.codes[0]
#2
s3.cat.codes[2]
#1