Unable to replace an existing column basis on another column-CodePudding

Below are two dataframes. The dataframe df1 is a cleanfile which has to be used as the mapping file to another dataframe df2 which has uncleaned information.

    df1=pd.DataFrame({'Make': {0: 'ASHOK LEYLAND', 1: 'ASTON MARTIN', 2: 'ASTON MARTIN'},
     'Model': {0: 'STILE', 1: 'DB9', 2: 'RAPIDE'},
     'Variant': {0: 'LE 7 STR', 1: 'VOLANTE', 2: 'LUXE'},
     'Fuel': {0: 'DIESEL', 1: 'PETROL', 2: 'PETROL'}})

    df2=pd.DataFrame({'Make': {0: 'ASHOK LEYLANDSTILELE 7 STR',
      1: 'ASTON MARTINDB9VOLANTE',
      2: 'ASTON MARTINRAPIDELUXE'},
     'Model': {0: 'STILELE 7 STR', 1: 'DB9VOLANTE', 2: 'RAPIDELUXE'},
     'Variant': {0: 'LE 7 STRSTILE', 1: 'VOLANTEDB9', 2: 'LUXERAPIDE'},
     'Fuel': {0: 'Dieseel', 1: 'Dieseel', 2: 'PETROLjlljlj'}})

I have used the below code to clean the 'Make' column of df2 basis on the 'Make' column of df1. However i dont get any result on my new column 'Make_new'.Below is the code:

    df2['Make_new'] = df2['Make'].apply(lambda v: [Make for Make in df1 if Make in ('ASHOK','ASTON')])

CodePudding user response：

You could use a regular expression to match a given set of "allowed" names to the the defective values and extract only the matched part. Note that following solution will only cover the situation in which the defective names have correct prefixes.

import re

def clean_col(correct_names_expression):
    def clean_row_func(x):
        if match_res := re.match(correct_names_expression, x):  # if no match is asserted, then None is returned
            return match_res[0]
        else:
            return None
    return clean_row_func

func_to_correct_rows = clean_col('('   '|'.join(df1.Make.tolist())   ')')

df2.Make.apply(func_to_correct_rows )

Out:
0    ASHOK LEYLAND
1     ASTON MARTIN
2     ASTON MARTIN

CodePudding user response：

I don't see why you would need df1 at all. The "cleaning" logic is pretty simple. This will do the trick.

def clean(row):
    if 'ASHOK' in row.Make:
        return 'ASHOK LEYLAND'
    if 'ASTON' in row.Make:
        return 'ASTON MARTIN'
    return row.Make

df2['cleaned'] = df2.apply(clean, axis=1)