I'm stuck in simple task. I have a test dataframe with spaces in it. In order to remove them I did following:
df_unique['final'] = df_unique['final'].astype("string")
df_unique['final'] = df_unique['final'].str.strip()
df_unique['final'] = df_unique['final'].str.replace(' ', '')
But still:
df_unique =
final
123 123
123 123 123
12345 123
df_unique.info()
show the column as String.
I think it is not working for DOUBLE spaces numbers. Idk maybe this information will help you
CodePudding user response:
Considering that the dataframe is called df
and looks like the following
final
0 123 123
1 123 123 123
2 12345 123
Assuming that the goal is to create a new column, let's call it new
, and store the values of the column final
, but without the spaces, one can create a custom lambda function using re
as follows
import re
df['new'] = df['final'].apply(lambda x: re.sub(r'\s', '', x))
[Out]:
final new
0 123 123 123123
1 123 123 123 123123123
2 12345 123 12345123
If one wants to update the column final
, then do the following
df['final'] = df['final'].apply(lambda x: re.sub(r'\s', '', x))
[Out]:
final
0 123123
1 123123123
2 12345123
Another option for this last use case would be using pandas.Series.str.replace
as
df['final'] = df['final'].str.replace(r'\s', '', regex=True)
[Out]:
final
0 123123
1 123123123
2 12345123
Note:
- One needs to pass
regex=True
, else one will get
FutureWarning: The default value of regex will change from True to False in a future version