I want to grab the entire string after first bracket after specific pattern e.g. x.set(, to the last corresponding bracket to first bracket from x.set(, even searching between lines (get as much text as needed before find corresponding ending bracket). Example string:
"ver = '1.0'
if x.set('1.2'):
p = x.set('python_version', None)
x = x.set('test_template', DEFAULT, p(x,b),
z())"
The result i search for should be (using re.findall):
find_all_res = [['1.2'],['python_version', None],['test_template', DEFAULT, p(x,b),\nz()]
Now i'm using:
re.findall(pattern="(?<![0-9a-zA-Z_])x.set([\s\S] ?)(?<=[)])(\s)", string=value)
And the result i have:
find_all_res = [[("('1.2'):\n p = x.set('python_version')", '\n'), ("('test_template', DEFAULT, p(x,b),\n z())", '\n')]
CodePudding user response:
You can pip install regex
to install the PyPi regex
library and use
\bx\.set\((?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()] |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)
See the regex in action. Details:
\b
- a word boundaryx\.set\(
-x.set(
string(?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()] |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*
- zero or more occurrences of:\s*(?:,\s*)?
- zero or more whitespaces, and then an optional occurrence of,
and zero or more whitespaces(?<o>
- Group "o" (it will contain all the strings you need):[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|
- a number pattern, or\w (?<a>\((?:[^()] |(?&a))*\))*
- one or more word chars, and then zero or more(...)
substrings with any amount of nested parentheses, or'[^'\\]*(?:\\.[^'\\]*)*'|
- a single quoted string literal with escape sequence support, or"[^"\\]*(?:\\.[^"\\]*)*"
- a double quoted string literal with escape sequence support
)
- end of group
\s*
- zero or more whitespaces\)
- a)
char.
See a Python demo:
import regex
text = r"""ver = '1.0'
if x.set('1.2'):
p = x.set('python_version', None)
x = x.set('test_template', DEFAULT, p(x,b),
z())"""
rx = r'''\bx\.set\((?:\s*(?:,\s*)?(?<o>[- ]?\d*\.?\d (?:[Ee][- ]?\d )?|\w (?<a>\((?:[^()] |(?&a))*\))*|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"))*\s*\)'''
print( [x.captures("o") for x in regex.finditer(rx, text, regex.S)] )
Output:
[["'1.2'"], ["'python_version'", 'None'], ["'test_template'", 'DEFAULT', 'p(x,b)', 'z()']]