Pandas column value replace using a dictionary with case insensitive match
I have a replacement dictionary and my conditions as below:
Replace the pandas df values with the replace_dict, also if any value ends with . followed by one or more zeros replace '.'zeros with ''(strip the .0s part)
import pandas as pd
replace_dict = {('True', 'Yes'): '1', ('False', 'No'): '0', '.0': ''}
df = pd.DataFrame(data = ['True','False', 'Yes', 2.0, '2.00000'])
CodePudding user response:
Try using pd.replace: pandas.DataFrame.replace
And replace the tuple with a single key and single value:
Input:
col1
0 True
1 False
2 Yes
3 2.0
4 2.00000
Script:
df['col1'] = df['col1'].astype(str).str.lower()
replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
df['col1'] = df['col1'].replace(replace_dict)
df
Output:
col1
0 1
1 0
2 1
3 2.0
4 2.00000
If you don't want to change non-relevant rows to lower case, you can try this:
Input:
col1
0 True
1 False
2 Yes
3 2.0
4 2.00000
5 Hey I AM not relevant!
Script:
replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
mask_relevant_rows = df['col1'].astype(str).str.lower().isin(replace_dict.keys())
df.loc[mask_relevant_rows, 'col1'] = df[mask_relevant_rows]['col1'].astype(str).str.lower().replace(replace_dict)
Output:
col1
0 1
1 0
2 1
3 2.0
4 2.00000
5 Hey I AM not relevant!
Hope it helps
CodePudding user response:
its not a classy solution but `
def rep_val(x):
if x=='True' or x=='Yes':
x=1
return x
elif x=='False' or x=='No':
x=0
return x
elif '.0' in x:
x=''
return x
df.a.astype(str).apply(rep_val)`
CodePudding user response:
df = pd.DataFrame(data=['true', 'TRue','False', 'Yes', '2.0', '2.0000', '.0002',
'A true warrior'],
columns=['col_a'])
replace_dict = {'True': '1', 'Yes': '1', 'False': '0', '.0 $': ''}
keys, values = list(replace_dict.keys()), list(replace_dict.values())
keys = ['^(?i){}$'.format(x) for x in keys]
df['col_b'] = df['col_a'].astype(str).replace(to_replace=keys ,value=values,
regex=True)
df
Output keys = ['^(?i){}$'.format(x) for x in keys]
replaces the exact value (but case insensitive). Look at the first 2 rows and last row
col_a col_b
0 true 1
1 TRue 1
2 False 0
3 Yes 1
4 2.0 2.0
5 2.0000 2.0000
6 .0002 .0002
7 A true warrior A true warrior
Output keys = ['(?i){}'.format(x) for x in keys]
replaces the substring. Look at the first 2 rows and the last row
col_a col_b
0 true 1
1 TRue 1
2 False 0
3 Yes 1
4 2.0 2.0
5 2.0000 2.0000
6 .0002 .0002
7 A true warrior A 1 warrior