How can I parse the below email options to just expected output. These are not in a dataframe, they are separate strings. I have a loop that loops through each string.
example input
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
expected output:
Louis Stevens
Louis Stevens
Louis Stevens
Louis Stevens
Mike Williams
Lebron James
Thanks
CodePudding user response:
Assuming s
the input Series, and using str.replace
:
import re
s.str.replace(r'^([a-z] )\.(?:.\.)?([a-z] ).*', r'\1 \2', regex=True, flags=re.I)
Output:
0 Louis Stevens
1 Louis Stevens
2 Louis Stevens
3 Louis Stevens
4 Mike Williams
5 Lebron James
dtype: object
For individual strings:
import re
s = '[email protected]'
out = re.sub(r'^([a-z] )\.(?:.\.)?([a-z] ).*', r'\1 \2', s, flags=re.I)
CodePudding user response:
Remove everything after @
with regex @.*
:
s = pd.Series("""[email protected]
[email protected]
[email protected]
[email protected]""".splitlines())
s.str.replace('@.*', '', regex=True)
#0 Louis.Stevens
#1 Louis.a.Stevens
#2 Louis.Stevens
#3 Louis.Stevens2
#dtype: object
CodePudding user response:
Use regex's findall
to extract the alpha numerics at the start of sentence and the alphanumeric immediately before @
. Then proceed and replace digits with nothing. Code below
email
0 [email protected]
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
df= df.assign(email_new =df['email'].str.findall('^\w |\w (?=\@)').str.join(' ').str.replace('\d','', regex=True))
email email_new
0 [email protected] Louis Stevens
1 [email protected] Louis Stevens
2 [email protected] Louis Stevens
3 [email protected] Louis Stevens
4 [email protected] Mike Williams
5 [email protected] Lebron James