Home > Mobile >  how to change the category code in a category column in pandas
how to change the category code in a category column in pandas

Time:02-14

I have a csv that I am reading using pandas.

In the csv, I have a column that has the following values:

x<1
1<x<2
2<x<3
3<x<4
x<4

when I convert them to category and then use category code, I am getting something such as this for category codes

{
x<1:2
1<x<2:1
2<x<3:3
3<x<4:4
x<4:0 
}

but I need the code to be as follow:

{
x<1:0
1<x<2:1
2<x<3:2
3<x<4:3
x<4:4 
}

How can I change the category code without changing the dataframe?

I used the following code to convert the column to category:

df['col'] = df['col'].astype('category')

CodePudding user response:

You can use pd.api.types.CategoricalDtype to change the category code follows:

Code:

import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({'col': ['x<1', '1<x<2', '2<x<3', '3<x<4', 'x<4']})
df['col'] = df['col'].astype('category')

# Get the current category order
category_order1 = df.col.cat.categories.to_list()
print('category_order_1:', category_order1)

# Invert the category order
co = category_order1[::-1]
df['col'] = df['col'].astype(pd.api.types.CategoricalDtype(categories=co, ordered=True))

# Get the current category order
category_order2 = df.col.cat.categories.to_list()
print('category_order_2:', category_order2)

# Define and apply arbitrary category order
co = ['2<x<3', 'x<4', 'x<1', '3<x<4', '1<x<2']
df['col'] = df['col'].astype(pd.api.types.CategoricalDtype(categories=co, ordered=True))

# Get the current category order
category_order3 = df.col.cat.categories.to_list()
print('category_order_3:', category_order3)

Output:

category_order_1: ['1<x<2', '2<x<3', '3<x<4', 'x<1', 'x<4']
category_order_2: ['x<4', 'x<1', '3<x<4', '2<x<3', '1<x<2']
category_order_3: ['2<x<3', 'x<4', 'x<1', '3<x<4', '1<x<2']
  • Related