Home > Enterprise >  How to combine Groups in Python RegEx?
How to combine Groups in Python RegEx?

Time:08-27

How to combine Groups (p1 and p2) in the following code?

import re

txt = "Sab11Mba11"
p1 = "(S(a|b)(a|b))"
p2 = "(M(a|b)(a|b))"
px = "("   p1   '|'   p2   ")"

print(re.findall(p1, txt)) # [('Sab', 'a', 'b')]
print(re.findall(p2, txt)) # [('Mba', 'b', 'a')]
print(re.findall(px, txt)) # [('Sab', 'Sab', 'a', 'b', '', '', ''), ('Mba', '', '', '', 'Mba', 'b', 'a')]

Can you please explain why do I get empty strings and how to get [('Sab', 'a', 'b'), ('Mba', 'b', 'a')]?

CodePudding user response:

You can try to use a branch reset group. It would require PyPi's regex module instead:

import regex as re

txt = 'Sab11Mba11'
p1 = r'(S(a|b)(a|b))'
p2 = r'(M(a|b)(a|b))'

px = r'(?|'   p1   '|'   p2   ')'
print(re.findall(px, txt))

Prints:

[('Sab', 'a', 'b'), ('Mba', 'b', 'a')]

Group numbers will be reused across different branches of a branch reset.

In general don't forget to use raw-string notation when working with regular expressions assuming S, a and b are placeholders for other constructs. Also note that you don't need 'px' per se if you'd use f-string construct. For example:

re.findall(fr'(?|{p1}|{p2})', txt)

CodePudding user response:

The empty values of capturing groups that did not participate in the match still get output.

You need to remove the outer parentheses and filter the resulting tuples from empty values:

import re

txt = "Sab11Mba11"
p1 = "(S(a|b)(a|b))"
p2 = "(M(a|b)(a|b))"
px = p1   '|'   p2
print([tuple(filter(lambda m: m != '', x)) for x in re.findall(px, txt)])
# => [('Sab', 'a', 'b'), ('Mba', 'b', 'a')]

See the Python demo.

  • Related