I am looking for a short way to run the same operation on multiple columns - typically by running it through .assign
.
import pandas as pd
df = pd.DataFrame({'a':[' abc ', 'efg', ' hij'], 'b': [' kkk ', 'eee', 'uuu ']})
In this example I have two columns with strings, some of which have leading and trailing spaces. If I want to remove them I would do something like this:
df.assign(a=lambda x: x["a"].str.strip(), b=lambda x: x["b"].str.strip())
basically repeating the same lambda expression for every column. Is there a more convenient way? I do not want to run a loop, because that cannot be method-chained.
My first idea was something like this, following the "don't repeat yourself" principle:
df.assign({col : lambda x: x[col].str.strip() for col in ['a', 'b']})
which of course does not work.
Any suggestions very welcome!
CodePudding user response:
You would need to unpack the dictionary to parameters using **
:
df.assign(**{col : lambda x: x[col].str.strip() for col in ['a', 'b']})
output:
a b
0 kkk kkk
1 eee eee
2 uuu uuu
ref: PEP 448 for dictionary unpacking
CodePudding user response:
You can run a lambda on multiple columns by using a double bracket option
import pandas as pd
df = pd.DataFrame({'a':[' abc ', 'efg', ' hij'], 'b': [' kkk ', 'eee', 'uuu ']})
df[['a', 'b']] = df[['a', 'b']].apply(lambda x : x.str.strip())