Home > Blockchain >  Regular expression to return string split up respecting nested parentheses
Regular expression to return string split up respecting nested parentheses

Time:12-24

I know many answers exist to the question on how to split up a string respecting parenthesis, but they never do so recursively. Looking at the string 1 2 3 (test 0, test 0) (test (0 test) 0):
Regex \s(?![^\(]*\)) returns "1", "2", "3", "(test 0, test 0)", "(test", "(0 test) 0)"
The regex I'm looking for would return either
"1", "2", "3", "(test 0, test 0)", "(test (0 test)0)"
or
"1", "2", "3", "test 0, test 0", "test (0 test)0"
which would let me recursively use it on the results again until no parentheses remain.
Ideally it would also respect escaped parentheses, but I myself am not this advanced in regex knowing only the basics.
Does anyone have an idea on how to take on this?

CodePudding user response:

Using regex only for the task might work but it wouldn't be straightforward.

Another possibility is writing a simple algorithm to track the parentheses in the string:

  1. Split the string at all parentheses, while returning the delimiter (e.g. using re.split)
  2. Keep a counters tracking the parentheses: s for ( and e for ) (as in start, end).
  3. Using the counters, proceed by either splitting at white spaces or adding the current data into a temp var ( t)
  4. When the left most parenthesis has been closed, append t to the list of values & reset the counters/temp vars.

Here's an example:

import re

string = "1 2 3 (test 0, test 0) (test (0 test) 0)"


res, s, e, t = [], 0, 0, ""
for x in re.split(r"([()])", string):
    if not x.strip():
        continue
    elif x == "(":
        if s > 0:
            t  = "("
        s  = 1
    elif x == ")":
        e  = 1
        if e == s:
            res.append(t)
            e, s, t = 0, 0, ""
        else:
            t  = ")"
    elif s > e:
        t  = x
    else:
        res.extend(x.strip(" ").split(" "))


print(res)
# ['1', '2', '3', 'test 0, test 0', 'test (0 test) 0']

Not very elegant, but works.

CodePudding user response:

You can use pip install regex and use

import regex
text = "1 2 3 (test 0, test 0) (test (0 test) 0)"
matches = [match.group() for match in regex.finditer(r"(?:(\((?>[^()] |(?1))*\))|\S) ", text)]
print(matches)
# => ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']

See the online Python demo. See the regex demo. The regex matches:

  • (?: - start of a non-capturing group:
    • (\((?>[^()] |(?1))*\)) - a text between any nested parentheses
  • | - or
    • \S - any non-whitespace char
  • ) - end of the group, repeat one or more times
  • Related