I have a csv that I am reading using pandas.
In the csv, I have a column that has the following values:
x<1
1<x<2
2<x<3
3<x<4
x<4
when I convert them to category and then use category code, I am getting something such as this for category codes
{
x<1:2
1<x<2:1
2<x<3:3
3<x<4:4
x<4:0
}
but I need the code to be as follow:
{
x<1:0
1<x<2:1
2<x<3:2
3<x<4:3
x<4:4
}
How can I change the category code without changing the dataframe?
I used the following code to convert the column to category:
df['col'] = df['col'].astype('category')
CodePudding user response:
You can use pd.api.types.CategoricalDtype
to change the category code follows:
Code:
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'col': ['x<1', '1<x<2', '2<x<3', '3<x<4', 'x<4']})
df['col'] = df['col'].astype('category')
# Get the current category order
category_order1 = df.col.cat.categories.to_list()
print('category_order_1:', category_order1)
# Invert the category order
co = category_order1[::-1]
df['col'] = df['col'].astype(pd.api.types.CategoricalDtype(categories=co, ordered=True))
# Get the current category order
category_order2 = df.col.cat.categories.to_list()
print('category_order_2:', category_order2)
# Define and apply arbitrary category order
co = ['2<x<3', 'x<4', 'x<1', '3<x<4', '1<x<2']
df['col'] = df['col'].astype(pd.api.types.CategoricalDtype(categories=co, ordered=True))
# Get the current category order
category_order3 = df.col.cat.categories.to_list()
print('category_order_3:', category_order3)
Output:
category_order_1: ['1<x<2', '2<x<3', '3<x<4', 'x<1', 'x<4']
category_order_2: ['x<4', 'x<1', '3<x<4', '2<x<3', '1<x<2']
category_order_3: ['2<x<3', 'x<4', 'x<1', '3<x<4', '1<x<2']