Home > Software design >  How to get the content of groups of balanced parentheses
How to get the content of groups of balanced parentheses

Time:09-16

text:

text1 = 'xx(aa)(bb)xx'
text2 = 'xx(aa(bb))xx'

expectation:

('aa', 'bb')  
('aa(bb)',  'bb')

My approach, but it does not meet expectations.

re.compile(r'\(\s?(. ?)\s?\)')

CodePudding user response:

You can install the PyPi regex module and use

import regex

texts = ['xx(aa)(bb)xx', 'xx(aa(bb))xx']
rx = r'\(((?:[^()]  |(?R))*)\)'

for text in texts:
    print(regex.findall(rx, text, overlapped=True))

See the Python demo. Output:

['aa', 'bb']
['aa(bb)', 'bb']

The \(((?:[^()] |(?R))*)\) regex is a common PCRE compliant regex that matches strings between nested paired parentheses, I added a capturing group for contents in between the brackets.

To get all overlapping parentheses, the overlapped=True option is passed to regex.findall.

  • Related