Home > Blockchain >  retrieve multiple substrings from a string in pandas dataframe
retrieve multiple substrings from a string in pandas dataframe

Time:10-14

I have a dataframe contains strings of email addresses with the format:

d = {'Country':'A', 'Email':'[email protected],[email protected],[email protected]'}
df = pd.DataFrame(data=d)

and I want the username of emails only. So the new dataframe should look like this:

d = {'Country':'A', 'Email':'123,456,789'}
df1 = pd.DataFrame(data=d)

The best way I could think of is to split the original string by comma, delete the domain part of emails and join the list back again. Are there better ways to this problem?

CodePudding user response:

This is a regex question, not really a Pandas question but here's a solution that'll return a list (which you can join together as a string)

import re

df['Email'].apply(lambda s: re.findall('\w (?=@)', s))

Output:

0    [123, 456, 789]
Name: Email, dtype: object

CodePudding user response:

If you want a string as output, you can remove the part starting on @. Use str.replace with the @[^,] regex:

df['Email'] = df['Email'].str.replace(r'@[^,] ', '', regex=True)

Output:

  Country        Email
0       A  123,456,789

For a list you could use str.findall:

df['Email'] = df['Email'].str.findall(r'[^,] (?=@)')

Output:

  Country            Email
0       A  [123, 456, 789]
  • Related