Home > Enterprise >  Replacing values for Pandas categorical column based on category codes
Replacing values for Pandas categorical column based on category codes

Time:06-13

I am looking for more elegant approach to replace the values for categorical column based on category codes. I am not able to use map method as the original values are not known in advance.

I am currently using the following approach:

df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

This approach feels inelegant because I convert categorical column to integer, and then convert it back to categorical. Full code is below.

import pandas as pd

df = pd.DataFrame({    
    'Name': ['Jack', 'John', 'Jil', 'Jax'],
    'Gender': ['M', 'M', 'F', pd.NA],
})

df['Gender'] = df['Gender'].astype('category')

# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})

# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

CodePudding user response:

here is one way to do that

create a dictionary of unique items and using enumerate assign an index

d = {item: i for i, item in enumerate(df['Gender'].unique())}

use map to map the values

df['cat'] = df['Gender'].map(d)
df
    Name    Gender  cat
0   Jack    M       0
1   John    M       0
2   Jil     F       1
3   Jax     <NA>    2

CodePudding user response:

What about using cat.rename_categories?

df['Gender'] = (df['Gender'].astype('category')
                .cat.rename_categories(['Female', 'Male'])
               )

output:

   Name  Gender
0  Jack    Male
1  John    Male
2   Jil  Female
3   Jax     NaN
  • Related