I have this dataset
df = pd.DataFrame({'name':{0: 'John,Smith', 1: 'Peter,Blue', 2:'Larry,One,Stacy,Orange' , 3:'Joe,Good' , 4:'Pete,High,Anne,Green'}})
yielding:
name
0 John,Smith
1 Peter,Blue
2 Larry,One,Stacy,Orange
3 Joe,Good
4 Pete,High,Anne,Green
I would like to:
- remove commas (replace them by one space)
- wherever I have 2 persons in one cell, insert the "&"symbol after the first person family name and before the second person name.
Desired output:
name
0 John Smith
1 Peter Blue
2 Larry One & Stacy Orange
3 Joe Good
4 Pete High & Anne Green
Tried this code below, but it simply removes commas. I could not find how to insert the "&"symbol in the same code.
df['name']= df['name'].str.replace(r',', '', regex=True)
Disclaimer : all names in this table are fictitious. No identification with actual persons (living or deceased)is intended or should be inferred.
CodePudding user response:
I would do it following way
import pandas as pd
df = pd.DataFrame({'name':{0: 'John,Smith', 1: 'Peter,Blue', 2:'Larry,One,Stacy,Orange' , 3:'Joe,Good' , 4:'Pete,High,Anne,Green'}})
df['name'] = df['name'].str.replace(',',' ').str.replace(r'(\w \w ) ', r'\1 & ', regex=True)
print(df)
gives output
name
0 John Smith
1 Peter Blue
2 Larry One & Stacy Orange
3 Joe Good
4 Pete High & Anne Green
Explanation: replace ,
s using spaces, then use replace again to change one-or-more word characters followed by space followed by one-or-more word character followed by space using content of capturing group (which includes everything but last space) followed by space followed by &
character followed by space.
CodePudding user response:
With single regex replacement:
df['name'].str.replace(r',([^,] )(,)?', lambda m:f" {m.group(1)}{' & ' if m.group(2) else ''}")
0 John Smith
1 Peter Blue
2 Larry One & Stacy Orange
3 Joe Good
4 Pete High & Anne Green
CodePudding user response:
This should work:
import re
def separate_names(original_str):
spaces = re.sub(r',([^,]*(?:,|$))', r' \1', original_str)
return spaces.replace(',', ' & ')
df['spaced'] = df.name.map(separate_names)
df
I created a function called separate_names which replaces the odd number of commas with spaces using regex. The remaining commas (even) are then replaced by & using the replace function. Finally I used the map function to apply separate_names to each row. The output is as follows:
CodePudding user response:
In replace
statement you should replace comma with space. Please put space between '' -> so you have ' '
df['name']= df['name'].str.replace(r',', ' ', regex=True)
inserted space ^ here