Hi need help for cleansing a data from a excel file using python. below is the sample table.
As you can see column "Account Name" as some string values from column "Billing Street"
I need to replace all the matching string (from "Billing Street") to blank in column "Account Name", if there is no string matching just need to retain the same value from "Account Name".
import pandas as pd
import numpy as np
import os
File= pd.read_excel('Test_data_for_script.xlsx')
File1=File.copy()
File_1['CLEANSE_NAME']= File_1['Account Name'].str.strip(File_1['Billing Street'])
using the above method the output was "NaN".
Below is the expected result.
Thanks Vydeesh
CodePudding user response:
You could apply()
a replace()
to each row of dataframe.
Example
import pandas as pd
df = pd.DataFrame({
'A':['A - some B text','A - some other B text'],
B':['some B text','some other B text']
})
df['C'] = df.apply(lambda row: row.A.replace(row.B,''), axis=1)
df
Output
A | B | C | |
---|---|---|---|
0 | A - some B text | some B text | A - |
1 | A - some other B text | some other B text | A - |
CodePudding user response:
I hope this will work for your solution, (use pandas apply function to create a custom match with replace)
import pandas as pd
import numpy as np
import os
File= pd.read_excel('./data_cleansing.xlsx')
File1=File.copy()
File1.head()
def replace_func(acct_name, billing_str):
if acct_name == billing_str:
return ''
else:
return acct_name
File1['Account Name'] = File1.apply(lambda x: replace_func(x['Account Name'], x['Billing Street']), axis=1)
File1