Home > Back-end >  Match entire string between first bracket to last corresponding bracket even multiline
Match entire string between first bracket to last corresponding bracket even multiline

Time:10-19

I want to grab the entire string after first bracket after specific pattern e.g. x.set(, to the last corresponding bracket to first bracket from x.set(, even searching between lines (get as much text as needed before find corresponding ending bracket). Example string:

"ver = '1.0'
if x.set('1.2'):
    p = x.set('python_version', None)
    x = x.set('test_template', DEFAULT, p(x,b),
    z())"

The result i search for should be (using re.findall):

find_all_res  = [['1.2'],['python_version', None],['test_template', DEFAULT, p(x,b),\nz()]

Now i'm using:

re.findall(pattern="(?<![0-9a-zA-Z_])x.set([\s\S] ?)(?<=[)])(\s)", string=value)

And the result i have:

find_all_res  = [[("('1.2'):\n        p = x.set('python_version')", '\n'), ("('test_template', DEFAULT, p(x,b),\n        z())", '\n')]

CodePudding user response:

You can pip install regex to install the PyPi regex library and use

\bx\.set\((?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()]  |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)

See the regex in action. Details:

  • \b - a word boundary
  • x\.set\( - x.set( string
  • (?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()] |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))* - zero or more occurrences of:
    • \s*(?:,\s*)? - zero or more whitespaces, and then an optional occurrence of , and zero or more whitespaces
    • (?<o> - Group "o" (it will contain all the strings you need):
      • [- ]?\d*\.?\d (?:[Ee][- ]?\d )?| - a number pattern, or
      • \w (?<a>\((?:[^()] |(?&a))*\))* - one or more word chars, and then zero or more (...) substrings with any amount of nested parentheses, or
      • '[^'\\]*(?:\\.[^'\\]*)*'| - a single quoted string literal with escape sequence support, or
      • "[^"\\]*(?:\\.[^"\\]*)*" - a double quoted string literal with escape sequence support
    • ) - end of group
  • \s* - zero or more whitespaces
  • \) - a ) char.

See a Python demo:

import regex
text = r"""ver = '1.0'
if x.set('1.2'):
    p = x.set('python_version', None)
    x = x.set('test_template', DEFAULT, p(x,b),
    z())"""
rx = r'''\bx\.set\((?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()]  |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)'''
print( [x.captures("o") for x in regex.finditer(rx, text, regex.S)] )

Output:

[["'1.2'"], ["'python_version'", 'None'], ["'test_template'", 'DEFAULT', 'p(x,b)', 'z()']]
  • Related