I need a regex that will get all the text occurences between parentheses, having in mind that all the content is encapsulated by the word BEGIN and the chars ---- at the end.
Input example:
BEGIN ) Tj\nET37.66 533 Td\n( Td\n(I NEED THIS TEXT ) Tj\nET\nBT\n37.334 Td\n(AND ALSO NEED THIS TEXT ) Tj\nET\nBT\n37.55 Td\n(------------
Expected matches:
I NEED THIS TEXT
AND ALSO NEED THIS TEXT
I already did something like (?<=BEGIN).*(?=\(--)
to the outside pattern, but i couldn't figure out how to get all text occurrences inside parentheses between this.
CodePudding user response:
With Python PyPi regex library, you can use
(?s)(?:\G(?!^)\)|BEGIN)(?:(?!\(--).)*?\((?!--)\K[^()]*
See the regex demo
Details:
(?s)
- a DOTALL inline modifier making.
match line break chars(?:\G(?!^)\)|BEGIN)
- eitherBEGIN
or the end of the previous successful match and a)
right after(?:(?!\(--).)*?
- any char, zero or more but as few as possible occurrences, that does not start a(--
char sequence\(
- a(
char(?!--)
- right after(
, there should be no--
\K
- match reset operator: what was matched before is discarded from the overall match memory buffer[^()]*
- zero or more chars other than(
and)
CodePudding user response:
Try:
\(((?:(?!BEGIN).)*?)\)(?=.*---)
\(((?:(?!BEGIN).)*?)\)
- Match everything between( )
, but notBEGIN
(?=.*---)
-.*---
must follow after this match