Home > other >  Regex to get all text occurrences between parentheses encapsulated by a second pattern
Regex to get all text occurrences between parentheses encapsulated by a second pattern

Time:11-22

I need a regex that will get all the text occurences between parentheses, having in mind that all the content is encapsulated by the word BEGIN and the chars ---- at the end.

Input example:

BEGIN       ) Tj\nET37.66 533 Td\n( Td\n(I NEED THIS TEXT      ) Tj\nET\nBT\n37.334 Td\n(AND ALSO NEED THIS TEXT         ) Tj\nET\nBT\n37.55 Td\n(------------

Expected matches:

I NEED THIS TEXT
AND ALSO NEED THIS TEXT

I already did something like (?<=BEGIN).*(?=\(--) to the outside pattern, but i couldn't figure out how to get all text occurrences inside parentheses between this.

CodePudding user response:

With Python PyPi regex library, you can use

(?s)(?:\G(?!^)\)|BEGIN)(?:(?!\(--).)*?\((?!--)\K[^()]*

See the regex demo

Details:

  • (?s) - a DOTALL inline modifier making . match line break chars
  • (?:\G(?!^)\)|BEGIN) - either BEGIN or the end of the previous successful match and a ) right after
  • (?:(?!\(--).)*? - any char, zero or more but as few as possible occurrences, that does not start a (-- char sequence
  • \( - a ( char
  • (?!--) - right after (, there should be no --
  • \K - match reset operator: what was matched before is discarded from the overall match memory buffer
  • [^()]* - zero or more chars other than ( and )

CodePudding user response:

Try:

\(((?:(?!BEGIN).)*?)\)(?=.*---)

Regex demo.


  • \(((?:(?!BEGIN).)*?)\) - Match everything between ( ), but not BEGIN

    • (?=.*---) - .*--- must follow after this match
  • Related