I have a list of emails that match a similar pattern as such:
The first email has 5 parts while the second one has 4 parts (marked by the hyphen) before the @mail.com
I need to extract the group_code that comes after the nonprod/prod portion of the group email.
For example for [email protected] i need to extract red,
and for [email protected] i need to extract blue.
The portion before the group code will always be prod or nonprod, further more there will always be the subtring "prod-" before the group code.
How can I go about extracting the group code from emails that have different amount of parts to always get the group code?
CodePudding user response:
re.findall('(?:prod-)(.*)-', s)
df['group'] = df['col2'].str.extract('(?:prod-)(.*)-' )
df
col1 col2 group
0 1 [email protected] red
1 2 [email protected] blue
2 3 NaN
CodePudding user response:
Using rfind & find
email2 = '[email protected]'
email = '[email protected]'
start = 'prod-'
start2= 'nonprod-'
end = '-'
print( (email[email.find(start or start2) len(start or start2):email.rfind(end)]))
print("\n")
output
red
CodePudding user response:
This should be working
(?<=prod-)[a-z]
Based on the input data that we have
[email protected]
[email protected]
and on the note that we always have nonprod or prod upfront the searching string, we can utilize regexp assertion. In this case positive look-behind assertion (?<=prod-) where we define that we are looking for prod- just before any word [a-z]
Note: assertion is not a part of the result