Home > Net >  case insensitive pandas.Series.replace
case insensitive pandas.Series.replace

Time:09-17

I want to replace some values in categorical data columns with np.nan. What is the best method for replacing values in a case-insensitive manner while maintaining the same categories (in the same order)?

import pandas as pd 
import numpy as np 

# set up a DF with ordered categories
values = ['one','two','three','na','Na','NA']
df = pd.DataFrame({
    'categ' : values
})
df['categ'] = df['categ'].astype('category')
df['categ'].cat.categories = values


# replace values
df['categ'].replace(
    to_replace='na',
    value=np.nan
)

CodePudding user response:

Maybe replace before converting to category

import pandas as pd 
import numpy as np 

# set up a DF with ordered categories
values = ['one','two','three','na','Na','NA']
df = pd.DataFrame({
    'categ' : values
})


df['categ'] = df['categ'].str.lower().replace('na',np.nan)

Output

  categ
0    one
1    two
2  three
3    NaN
4    NaN
5    NaN

CodePudding user response:

You can also throw in a case insensitive regex flag, like so:

df['categ'].replace(
    to_replace=r'(?i:na)',
    regex=True,
    value=np.nan
)
  • Related