Remove space between string after comma in python dataframe column-CodePudding

df1

ID                       Col
1       new york, london school of economics, america
2       california & washington,  harvard university, america

Expected output is :

df1

ID                       Col
1       new york,london school of economics,america
2       california & washington,harvard university,america

My try is :

df1[Col].apply(lambda x : x.str.replace(", ","", regex=True))

CodePudding user response：

It is advisable to use the regular expression ,\s , which allows you to capture several consecutive whitespace characters after a comma, as in washington, harvard

df = pd.DataFrame({'ID': [1, 2], 'Col': ['new york,           london school of economics,  america',
                                         'california & washington,  harvard university, america']}).set_index('ID')
df.Col = df.Col.str.replace(r',\s ', ',', regex=True)
print(df)

                                                  Col
ID                                                   
1         new york,london school of economics,america
2   california & washington,harvard university,ame...

CodePudding user response：

You can use str.replace(', ', ",") instead of a lambda function. However, this will only work if there is only one space after ",".

As Алексей Р mentioned, (r',\s ', ",", regex=True) is needed to catch any extra spaces after ",".

Reference: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

Example:

import pandas as pd

data_ = ['new york, london school of economics, america', 'california & washington,  harvard university, america']

df1 = pd.DataFrame(data_)
df1.columns = ['Col']
df1.index.name = 'ID'
df1.index = df1.index   1

df1['Col'] = df1['Col'].str.replace(r',\s ', ",", regex=True)

print(df1)

Result:

                                                  Col
ID                                                   
1         new york,london school of economics,america
2   california & washington,harvard university,ame...

CodePudding user response：

If you mention the axis it will be solved

df.apply(lambda x: x.str.replace(', ',',',regex=True),axis=1)

CodePudding user response：

You can split the string on ',' and then remove the extra whitespaces and join the list.

df1=df1['Col'].apply(lambda x : ",".join([w.strip() for w in x.split(',')]))

Hope this helps.