Home > OS >  Regex Expression help - python
Regex Expression help - python

Time:09-29

I need help from some Regex Guru's. I have been struggling on this one for a little while and can't get it working as intended.

This is my regex patter at the moment - it get everything between ',' and ')'

df['regex_2'] = df['name'].str.extract(r'\,(.*?)\)')

Text 123 (SDC, XUJ)
Text BCD (AUD)
Text 123 (AUD, XTJ)
Text BCSS (AUD,TACT,HGI7649AU)


` XUJ`
``
` XTJ`
`TACT,HGI7649AU`

However, what I need is all characters after the last comma before the bracket. Please see examples below.

Text 123 (SDC, XUJ)
Text BCD (AUD)
Text 123 (AUD, XTJ)
Text BCSS (AUD,TACT,HGI7649AU)


`XUJ`
`` 
`XTJ`
`HGI7649AU`

CodePudding user response:

The pattern used matches any character after the comma, including commas themselves:

r'\,(.*?)\)'

In the following test case this yields both tokens after the first comma because , is a matching character:

Text BCSS (AUD,TACT,HGI7649AU) -> TACT,HGI7649AU

One way to achieve the goal of only capturing the token after the last comma and before the parenthesis (if one exists) is to instead match on all characters excluding commas:

r'\,([^,]*?)\)'

CodePudding user response:

How about the following?

import re

text = """Text 123 (SDC, XUJ)
Text BCD (AUD)
Text 123 (AUD, XTJ)
Text BCSS (AUD,TACT,HGI7649AU)"""

for line in text.splitlines():
    m = re.search(r',\s*(\w )\)', line)
    print(m.group(1) if m else '')

Output:

XUJ

XTJ
HGI7649AU

Note that I am using m.group(1) if m else '' to handle the case the regex does not find the pattern, e.g., in the second line of your example.

  • Related