I have a data-frame where I need to perform similar steps for a one column like in the example below:
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].apply(lambda x : x.replace(',',''))
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].apply(lambda x : x.replace('.',' '))
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].apply(lambda x : x.replace('-',' '))
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].str.strip()
Is there any way to consolidate all these steps into one line just for sake of not repeating the lines?
CodePudding user response:
You can use method chaining. You can also use .str.replace
instead of the lambda function. Finally you can use regex to replace all of those symbols at once:
dfd_landing['Supplier name'] = (
dfd_landing['Supplier name']
.str.replace(r'[,.-]','', regex=True)
.str.strip()
)
I like to format method chains by putting everything inside paratheses and starting each method on a new line. I typically think of the .str
method as more of a prefix than a method to I keep it on the same line as the main string method I am trying to use.
Sample input:
dfd_landing = pd.DataFrame({'Supplier name': [', foo.-bar ']})
Output:
Supplier name
0 foobar
CodePudding user response:
Below mentioned simple code snippet and minor modification in your code should work for you:
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].apply(lambda x : x.replace(',','').replace('.',' ').replace('-',' ').strip())
Sample input:
dfd_landing = pd.DataFrame({'Supplier name': ['A,asda', 'B.asdas', 'C-asdasd', ' D ', 'E '],})
Supplier name
0 A,asda
1 B.asdas
2 C-asdasd
3 D
4 E
Code and Sample output:
dfd_landing['Supplier name'] = dfd_landing['Supplier name'].apply(lambda x : x.replace(',','').replace('.',' ').replace('-',' ').strip())
Supplier name
0 Aasda
1 B asdas
2 C asdasd
3 D
4 E