Pandas column value replace using a dictionary with case insensitive match
I have a replacement dictionary and my conditions as below:
Replace the pandas df values with the replace_dict, also if any value ends with . followed by one or more zeros replace '.'zeros with ''(strip the .0s part)
import pandas as pd
replace_dict = {('True', 'Yes'): 1, ('False', 'No'): 0, '.0': ''}
df = pd.DataFrame(data = ['True','False', 'Yes', 2.0, '2.00000'])
CodePudding user response:
We can use where
from numpy
in this case :
import numpy as np
condlist = [df[0] == 'True',
df[0] == 'Yes',
df[0] == 'False',
df[0] == 'No',
df[0] == '.0']
choicelist = [1,
1,
0,
0,
'']
df['new_vals'] = np.select(condlist, choicelist, default=np.nan)
Output :
0 new_vals
0 True 1
1 False 0
2 Yes 1
3 2.0 nan
4 2.00000 nan
CodePudding user response:
Try using pd.replace: pandas.DataFrame.replace
And replace the tuple with a single key and single value:
Input:
col1
0 True
1 False
2 Yes
3 2.0
4 2.00000
Script:
df['col1'] = df['col1'].astype(str).str.lower()
replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
df['col1'] = df['col1'].replace(replace_dict)
df
Output:
col1
0 1
1 0
2 1
3 2.0
4 2.00000
If you don't want to change non-relevant rows to lower case, you can try this:
Input:
col1
0 True
1 False
2 Yes
3 2.0
4 2.00000
5 Hey I AM not relevant!
Script:
replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
mask_relevant_rows = df['col1'].astype(str).str.lower().isin(replace_dict.keys())
df.loc[mask_relevant_rows, 'col1'] = df[mask_relevant_rows]['col1'].astype(str).str.lower().replace(replace_dict)
Output:
col1
0 1
1 0
2 1
3 2.0
4 2.00000
5 Hey I AM not relevant!
Hope it helps