Extract subtring after certain text and before hyphen (regex)-CodePudding

I have a list of emails that match a similar pattern as such:

The first email has 5 parts while the second one has 4 parts (marked by the hyphen) before the @mail.com

I need to extract the group_code that comes after the nonprod/prod portion of the group email.

For example for [email protected] i need to extract red,

and for [email protected] i need to extract blue.

The portion before the group code will always be prod or nonprod, further more there will always be the subtring "prod-" before the group code.

How can I go about extracting the group code from emails that have different amount of parts to always get the group code?

CodePudding user response：

re.findall('(?:prod-)(.*)-', s)

df['group'] = df['col2'].str.extract('(?:prod-)(.*)-' )
df

    col1    col2                                    group
0   1       [email protected]    red
1   2       [email protected]            blue
2   3                                               NaN

CodePudding user response：

Using rfind & find

email2 = '[email protected]'
email = '[email protected]'

start = 'prod-'
start2= 'nonprod-'
end = '-'

print( (email[email.find(start or start2) len(start or start2):email.rfind(end)]))
print("\n")

output

red

CodePudding user response：

This should be working

(?<=prod-)[a-z]

Based on the input data that we have

[email protected]
[email protected]

and on the note that we always have nonprod or prod upfront the searching string, we can utilize regexp assertion. In this case positive look-behind assertion (?<=prod-) where we define that we are looking for prod- just before any word [a-z]

Note: assertion is not a part of the result