Removing spaces from a column in pandas-CodePudding

This is very closely related to Removing space from columns in pandas so I wasn't sure whether to add it to a comment to that... the difference in my question is specifically relating to the use of a loc locator to slice out a subset...

df['py'] = df['py'].str.replace(' ','')

-- this works fine; but when I only want to apply it on the subset of rows where the column subset is 'foo':

df.loc[df['column'] == 'foo']['py'] = df.loc[df['column'] == 'foo']['py'].str.replace(' ','')

...doesn't work.

What am I doing wrong? I can always slice out the group and re-append it, but curious where I'm going wrong here.

A dataset for trials:

df = pd.DataFrame({'column':['foo','foo','bar','bar'], 'py':['a b','a b','a b','a b']})

Thanks

CodePudding user response：

You want:

df.loc[df['column'] == 'foo', 'py'].apply(lambda x: x.replace(' ',''))

Note the notation of loc.

CodePudding user response：

Pandas StringAccessor also supports regex

>>> pd.DataFrame({"column_1": ["hello ", " world", "space in the middle", "two  spaces", "one\ttab"]}).column_1.str.replace(r"\s ", "")

0               hello
1               world
2    spaceinthemiddle
3           twospaces
4              onetab

Combine that with numpy.where() and I think you have what you need.

np.where(
   <condition>,  # defines the loc which rows to edit
   df[column_name].str.replace(r"\s ", ""),  # the substitution to make in that loc
   df[column_name]  # the default value used on other rows
)