I have a dataframe contains strings of email addresses with the format:
d = {'Country':'A', 'Email':'[email protected],[email protected],[email protected]'}
df = pd.DataFrame(data=d)
and I want the username of emails only. So the new dataframe should look like this:
d = {'Country':'A', 'Email':'123,456,789'}
df1 = pd.DataFrame(data=d)
The best way I could think of is to split the original string by comma, delete the domain part of emails and join the list back again. Are there better ways to this problem?
CodePudding user response:
This is a regex question, not really a Pandas question but here's a solution that'll return a list (which you can join together as a string)
import re
df['Email'].apply(lambda s: re.findall('\w (?=@)', s))
Output:
0 [123, 456, 789]
Name: Email, dtype: object
CodePudding user response:
If you want a string as output, you can remove the part starting on @
. Use str.replace
with the @[^,]
regex:
df['Email'] = df['Email'].str.replace(r'@[^,] ', '', regex=True)
Output:
Country Email
0 A 123,456,789
For a list you could use str.findall
:
df['Email'] = df['Email'].str.findall(r'[^,] (?=@)')
Output:
Country Email
0 A [123, 456, 789]