Say I have a string:
r'pat1=a, pat2=b, (e, e*89=f), bb, pat3=c, pat4=hi, pat10=ex'
I need to extract patterns as:
pat1=a,
pat2=b, (e, e*89=f), bb,
pat3=c,
pat4=hi,
pat10=ex
This is the pattern I tried:
re.findall(r'(pat\d*.*?)[(pat\d*)|$]', s)
which gives me:
['pat1=', 'pat2=b, ', 'pat3=c, ', 'pat1']
I am more interested in knowing how exactly my pattern is working here that it did not match the required string. Also what could be the solution.
CodePudding user response:
The pattern that you tried (pat\d*.*?)[(pat\d*)|$]
matches pat
and optional digits, then as least as possible chars until it matches one of the listed characters in the character class [(pat\d*)|$]
To get your desired matches, you don't want to match anything after .*?
but you want to assert either the start of a part with the same pattern for pat
.
And for the last part, you can assert the end of the string.
You could write the pattern as:
\bpat\d =.*?(?=\s*\bpat\d =|$)
The pattern matches:
\bpat\d =
Match the wordpat
followed by 1 digits and=
.*?
Match as least chars as possible(?=
Positive lookahead, assert to the right\s*\bpat\d =
Match optional whitespace chars, thenpat
, 1 digits and=
|
Or$
Assert the end of the string for the last part
)
Close the lookahead