I am looking for more elegant approach to replace the values for categorical column based on category codes. I am not able to use map
method as the original values are not known in advance.
I am currently using the following approach:
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
This approach feels inelegant because I convert categorical column to integer, and then convert it back to categorical. Full code is below.
import pandas as pd
df = pd.DataFrame({
'Name': ['Jack', 'John', 'Jil', 'Jax'],
'Gender': ['M', 'M', 'F', pd.NA],
})
df['Gender'] = df['Gender'].astype('category')
# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
CodePudding user response:
here is one way to do that
create a dictionary of unique items and using enumerate assign an index
d = {item: i for i, item in enumerate(df['Gender'].unique())}
use map to map the values
df['cat'] = df['Gender'].map(d)
df
Name Gender cat
0 Jack M 0
1 John M 0
2 Jil F 1
3 Jax <NA> 2
CodePudding user response:
What about using cat.rename_categories
?
df['Gender'] = (df['Gender'].astype('category')
.cat.rename_categories(['Female', 'Male'])
)
output:
Name Gender
0 Jack Male
1 John Male
2 Jil Female
3 Jax NaN