Data cleansing using python-CodePudding

Hi need help for cleansing a data from a excel file using python. below is the sample table.

As you can see column "Account Name" as some string values from column "Billing Street"

I need to replace all the matching string (from "Billing Street") to blank in column "Account Name", if there is no string matching just need to retain the same value from "Account Name".

import pandas as pd
import numpy as np
import os

File= pd.read_excel('Test_data_for_script.xlsx')
File1=File.copy()

File_1['CLEANSE_NAME']= File_1['Account Name'].str.strip(File_1['Billing Street'])

using the above method the output was "NaN".

Below is the expected result.

Output required.

Thanks Vydeesh

CodePudding user response：

You could apply() a replace() to each row of dataframe.

Example

import pandas as pd

df = pd.DataFrame({
           'A':['A - some B text','A - some other B text'], 
            B':['some B text','some other B text']
     })

df['C'] = df.apply(lambda row: row.A.replace(row.B,''), axis=1)

df

Output

	A	B	C
0	A - some B text	some B text	A -
1	A - some other B text	some other B text	A -

CodePudding user response：

I hope this will work for your solution, (use pandas apply function to create a custom match with replace)

import pandas as pd
import numpy as np
import os

File= pd.read_excel('./data_cleansing.xlsx')
File1=File.copy()
File1.head()
def replace_func(acct_name, billing_str):
    if acct_name == billing_str:
        return ''
    else:
        return acct_name

File1['Account Name'] = File1.apply(lambda x: replace_func(x['Account Name'], x['Billing Street']), axis=1)
File1