df1
ID Col
1 new york, london school of economics, america
2 california & washington, harvard university, america
Expected output is :
df1
ID Col
1 new york,london school of economics,america
2 california & washington,harvard university,america
My try is :
df1[Col].apply(lambda x : x.str.replace(", ","", regex=True))
CodePudding user response:
It is advisable to use the regular expression ,\s
, which allows you to capture several consecutive whitespace characters after a comma, as in washington, harvard
df = pd.DataFrame({'ID': [1, 2], 'Col': ['new york, london school of economics, america',
'california & washington, harvard university, america']}).set_index('ID')
df.Col = df.Col.str.replace(r',\s ', ',', regex=True)
print(df)
Col
ID
1 new york,london school of economics,america
2 california & washington,harvard university,ame...
CodePudding user response:
You can use str.replace(', ', ",")
instead of a lambda function. However, this will only work if there is only one space after ","
.
As Алексей Р mentioned, (r',\s ', ",", regex=True)
is needed to catch any extra spaces after ","
.
Reference: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
Example:
import pandas as pd
data_ = ['new york, london school of economics, america', 'california & washington, harvard university, america']
df1 = pd.DataFrame(data_)
df1.columns = ['Col']
df1.index.name = 'ID'
df1.index = df1.index 1
df1['Col'] = df1['Col'].str.replace(r',\s ', ",", regex=True)
print(df1)
Result:
Col
ID
1 new york,london school of economics,america
2 california & washington,harvard university,ame...
CodePudding user response:
If you mention the axis it will be solved
df.apply(lambda x: x.str.replace(', ',',',regex=True),axis=1)
CodePudding user response:
You can split the string on ','
and then remove the extra whitespaces and join the list.
df1=df1['Col'].apply(lambda x : ",".join([w.strip() for w in x.split(',')]))
Hope this helps.