This is very closely related to Removing space from columns in pandas so I wasn't sure whether to add it to a comment to that...
the difference in my question is specifically relating to the use of a loc
locator to slice out a subset...
df['py'] = df['py'].str.replace(' ','')
-- this works fine; but when I only want to apply it on the subset of rows where the column subset is 'foo':
df.loc[df['column'] == 'foo']['py'] = df.loc[df['column'] == 'foo']['py'].str.replace(' ','')
...doesn't work.
What am I doing wrong? I can always slice out the group and re-append it, but curious where I'm going wrong here.
A dataset for trials:
df = pd.DataFrame({'column':['foo','foo','bar','bar'], 'py':['a b','a b','a b','a b']})
Thanks
CodePudding user response:
You want:
df.loc[df['column'] == 'foo', 'py'].apply(lambda x: x.replace(' ',''))
Note the notation of loc
.
CodePudding user response:
Pandas StringAccessor also supports regex
>>> pd.DataFrame({"column_1": ["hello ", " world", "space in the middle", "two spaces", "one\ttab"]}).column_1.str.replace(r"\s ", "")
0 hello
1 world
2 spaceinthemiddle
3 twospaces
4 onetab
Combine that with numpy.where()
and I think you have what you need.
np.where(
<condition>, # defines the loc which rows to edit
df[column_name].str.replace(r"\s ", ""), # the substitution to make in that loc
df[column_name] # the default value used on other rows
)