Home > Software engineering >  Split Email Address (attention to second level domain!)
Split Email Address (attention to second level domain!)

Time:07-23

I'm trying to correctly split Email addresses like so:

Email Provider Domain
[email protected] something com
[email protected] something city.com

In other words I would like to create 2 columns, df['Provider'] and df['Domain'], starting from df['Email'].

My first failed try was using:

df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.').str[0]

df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.').str[1]

but it fails to recognize the domain of the Email, and the result is something like that:

Email Provider Domain
[email protected] something com
[email protected] something city

I have found a function that solves this problem, but I'm struggling to understand how to correctly use it for my df

email = '[email protected]'

f1, f2 = email.rsplit('@',1)

provider_email, domain_email = [f2.rsplit('.')[i] for i in (0,-1)]

I don't have to necessarily use this function, but this is the best I could find to solve my problem, do you have any advice?

Thank you

CodePudding user response:

You can add number of splits:

df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.', 1).str[0]

df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.', 1).str[1]

Also try:

(df['Email'].str.split('@').str[1]).str.split('.', 1, expand=True)

CodePudding user response:

df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.').str[0]

df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.').str[-1]

output:

                             Email   Provider Domain
0       [email protected]  something    com
1  [email protected]  something    com

Rather than taking first occurence after . split take last occurence then it will fetch correct data for you

  • Related