Home > Blockchain >  Regex pattern skips last matches and misses content with parenthesis
Regex pattern skips last matches and misses content with parenthesis

Time:08-20

Say I have a string:

r'pat1=a, pat2=b, (e, e*89=f), bb, pat3=c, pat4=hi, pat10=ex'

I need to extract patterns as:

pat1=a, 
pat2=b, (e, e*89=f), bb, 
pat3=c, 
pat4=hi, 
pat10=ex

This is the pattern I tried:

re.findall(r'(pat\d*.*?)[(pat\d*)|$]', s)

which gives me:

['pat1=', 'pat2=b, ', 'pat3=c, ', 'pat1']

I am more interested in knowing how exactly my pattern is working here that it did not match the required string. Also what could be the solution.

CodePudding user response:

The pattern that you tried (pat\d*.*?)[(pat\d*)|$] matches pat and optional digits, then as least as possible chars until it matches one of the listed characters in the character class [(pat\d*)|$]

To get your desired matches, you don't want to match anything after .*? but you want to assert either the start of a part with the same pattern for pat.

And for the last part, you can assert the end of the string.


You could write the pattern as:

\bpat\d =.*?(?=\s*\bpat\d =|$)

The pattern matches:

  • \bpat\d = Match the word pat followed by 1 digits and =
  • .*? Match as least chars as possible
  • (?= Positive lookahead, assert to the right
    • \s*\bpat\d = Match optional whitespace chars, then pat, 1 digits and =
    • | Or
    • $ Assert the end of the string for the last part
  • ) Close the lookahead

Regex demo

  • Related