Home > other >  How to clean data in dataframe column?
How to clean data in dataframe column?

Time:12-19

Hi need help for cleansing a data from a excel file using python. below is the sample table.

As you can see column "Account Name" as some string values from column "Billing Street"

Input file

I need to replace all the matching string (from "Billing Street") to blank in column "Account Name", if there is no string matching just need to retain the same value from "Account Name".

import pandas as pd
import numpy as np
import os

File= pd.read_excel('Test_data_for_script.xlsx')
File1=File.copy()

File_1['CLEANSE_NAME']= File_1['Account Name'].str.strip(File_1['Billing Street'])

using the above method the output was "NaN".

Below is the expected result.

Output required

CodePudding user response:

I hope this will work for your solution, (use pandas apply function to create a custom match with replace)

import pandas as pd
import numpy as np
import os

File= pd.read_excel('./data_cleansing.xlsx')
File1=File.copy()
File1.head()
def replace_func(acct_name, billing_str):
    if acct_name == billing_str:
        return ''
    else:
        return acct_name

File1['Account Name'] = File1.apply(lambda x: replace_func(x['Account Name'], x['Billing Street']), axis=1)
File1

CodePudding user response:

You could apply() a replace() to each row of dataframe.

Example

import pandas as pd

df = pd.DataFrame({
        'A':['A - some B text','A - some other B text', 'A - just A'], 
        'B':['B text','other B text','no matching']
     })
df['C'] = df.apply(lambda row: row.A.replace(row.B,''), axis=1)
df

Output

A B C
0 A - some B text B text A - some
1 A - some other B text other B text A - some
2 A - just A no matching A - just A

CodePudding user response:

Use apply and replace function from pandas.

import pandas as pd
import numpy as np
import os

File = pd.read_excel("Test_data_for_script.xlsx")
File1 = File.copy()

File1["CLEANSE_NAME"] = File1.apply(
    lambda row: row["Account Name"].replace(row["Billing Street"], ""), axis=1
)

print(File1)
  • Related