I want to replace some values in categorical data columns with np.nan
. What is the best method for replacing values in a case-insensitive manner while maintaining the same categories (in the same order)?
import pandas as pd
import numpy as np
# set up a DF with ordered categories
values = ['one','two','three','na','Na','NA']
df = pd.DataFrame({
'categ' : values
})
df['categ'] = df['categ'].astype('category')
df['categ'].cat.categories = values
# replace values
df['categ'].replace(
to_replace='na',
value=np.nan
)
CodePudding user response:
Maybe replace before converting to category
import pandas as pd
import numpy as np
# set up a DF with ordered categories
values = ['one','two','three','na','Na','NA']
df = pd.DataFrame({
'categ' : values
})
df['categ'] = df['categ'].str.lower().replace('na',np.nan)
Output
categ
0 one
1 two
2 three
3 NaN
4 NaN
5 NaN
CodePudding user response:
You can also throw in a case insensitive regex flag, like so:
df['categ'].replace(
to_replace=r'(?i:na)',
regex=True,
value=np.nan
)