I'm trying to apply a simple function (eliminating spaces) across multiple columns of a pandas DataFrame. However, while the .apply() method works properly on a single column, it doesn't work properly over multiple columns. Example:
#Weird Pandas behavior
######
#Input
df = pd.DataFrame ({'a' : ["7 7","5 3"],
'b' : ['f o', 'b r'],
'c' : ["77","53"]})
print(df)
a b c
0 7 7 f o 77
1 5 3 b r 53
df[["a","b"]]=df[["a","b"]].apply(lambda x: x.replace(" ",""))
print(df)
a b c
0 7 7 f o 77
1 5 3 b r 53
df2=copy.deepcopy(df)
print(df2)
a b c
0 7 7 f o 77
1 5 3 b r 53
df2["a"]=df2["a"].apply(lambda x: x.replace(" ",""))
print(df2)
a b c
0 77 f o 77
1 53 b r 53
As you can see, df doesn't change at all when I try to apply the "replace" operation to two columns, but the same dataset (or rather a copy of it) does change when I run the same operation on a single column. How can I remove spaces from two or more columns at once using the .apply() syntax?
I tried passing in the arguments '[a]' (nothing happens) and 'list(a)' (nothing happens) to df[].
CodePudding user response:
When you pass multiple columns, x
is a pandas series, not the individual column values. You need to use .str.replace()
to operate on each column.
df[["a","b"]]=df[["a","b"]].apply(lambda x: x.str.replace(" ",""))
CodePudding user response: