Home > Enterprise >  Pandas - Replace column values using a dictionary with case insensitive match
Pandas - Replace column values using a dictionary with case insensitive match

Time:10-13

Pandas column value replace using a dictionary with case insensitive match

I have a replacement dictionary and my conditions as below:

Replace the pandas df values with the replace_dict, also if any value ends with . followed by one or more zeros replace '.'zeros with ''(strip the .0s part)

import pandas as pd
replace_dict = {('True', 'Yes'): '1', ('False', 'No'): '0', '.0': ''}
df = pd.DataFrame(data = ['True','False', 'Yes', 2.0, '2.00000'])

CodePudding user response:

Try using pd.replace: pandas.DataFrame.replace

And replace the tuple with a single key and single value:

Input:

    col1
0   True
1   False
2   Yes
3   2.0
4   2.00000

Script:

df['col1'] = df['col1'].astype(str).str.lower()
replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
df['col1'] = df['col1'].replace(replace_dict)
df

Output:

col1
0   1
1   0
2   1
3   2.0
4   2.00000

If you don't want to change non-relevant rows to lower case, you can try this:

Input:

col1
0   True
1   False
2   Yes
3   2.0
4   2.00000
5   Hey I AM not relevant!

Script:

replace_dict = {'true': 1, 'yes': 1, 'false': 0, 'no': 0, '.0': ''}
mask_relevant_rows = df['col1'].astype(str).str.lower().isin(replace_dict.keys())
df.loc[mask_relevant_rows, 'col1'] = df[mask_relevant_rows]['col1'].astype(str).str.lower().replace(replace_dict)

Output:

col1
0   1
1   0
2   1
3   2.0
4   2.00000
5   Hey I AM not relevant!

Hope it helps

CodePudding user response:

its not a classy solution but `

def rep_val(x):
    if x=='True' or x=='Yes':
        x=1
        return x
    elif x=='False' or x=='No':
        x=0
        return x
    elif '.0' in x:
        x=''
        return x
df.a.astype(str).apply(rep_val)`

CodePudding user response:

df = pd.DataFrame(data=['true', 'TRue','False', 'Yes', '2.0', '2.0000', '.0002', 
                        'A true warrior'], 
                  columns=['col_a'])

replace_dict = {'True': '1', 'Yes': '1', 'False': '0', '.0 $': ''} 

keys, values = list(replace_dict.keys()), list(replace_dict.values())
keys = ['^(?i){}$'.format(x) for x in keys]
df['col_b'] = df['col_a'].astype(str).replace(to_replace=keys ,value=values, 
                                              regex=True)
df

Output keys = ['^(?i){}$'.format(x) for x in keys] replaces the exact value (but case insensitive). Look at the first 2 rows and last row

    col_a           col_b
0   true            1
1   TRue            1
2   False           0
3   Yes             1
4   2.0             2.0
5   2.0000          2.0000
6   .0002           .0002
7   A true warrior  A true warrior

Output keys = ['(?i){}'.format(x) for x in keys] replaces the substring. Look at the first 2 rows and the last row

    col_a           col_b
0   true            1
1   TRue            1
2   False           0
3   Yes             1
4   2.0             2.0
5   2.0000          2.0000
6   .0002           .0002
7   A true warrior  A 1 warrior
  • Related