I'm trying to correctly split Email addresses like so:
Provider | Domain | |
---|---|---|
[email protected] | something | com |
[email protected] | something | city.com |
In other words I would like to create 2 columns, df['Provider'] and df['Domain'], starting from df['Email'].
My first failed try was using:
df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.').str[0]
df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.').str[1]
but it fails to recognize the domain of the Email, and the result is something like that:
Provider | Domain | |
---|---|---|
[email protected] | something | com |
[email protected] | something | city |
I have found a function that solves this problem, but I'm struggling to understand how to correctly use it for my df
email = '[email protected]'
f1, f2 = email.rsplit('@',1)
provider_email, domain_email = [f2.rsplit('.')[i] for i in (0,-1)]
I don't have to necessarily use this function, but this is the best I could find to solve my problem, do you have any advice?
Thank you
CodePudding user response:
You can add number of splits:
df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.', 1).str[0]
df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.', 1).str[1]
Also try:
(df['Email'].str.split('@').str[1]).str.split('.', 1, expand=True)
CodePudding user response:
df['Provider'] = (df['Email'].str.split('@').str[1]).str.split('.').str[0]
df['Domain'] = (df['Email'].str.split('@').str[1]).str.split('.').str[-1]
output:
Email Provider Domain
0 [email protected] something com
1 [email protected] something com
Rather than taking first occurence after . split take last occurence then it will fetch correct data for you