I know many answers exist to the question on how to split up a string respecting parenthesis, but they never do so recursively.
Looking at the string 1 2 3 (test 0, test 0) (test (0 test) 0)
:
Regex \s(?![^\(]*\))
returns "1", "2", "3", "(test 0, test 0)", "(test", "(0 test) 0)"
The regex I'm looking for would return either
"1", "2", "3", "(test 0, test 0)", "(test (0 test)0)"
or
"1", "2", "3", "test 0, test 0", "test (0 test)0"
which would let me recursively use it on the results again until no parentheses remain.
Ideally it would also respect escaped parentheses, but I myself am not this advanced in regex knowing only the basics.
Does anyone have an idea on how to take on this?
CodePudding user response:
Using regex
only for the task might work but it wouldn't be straightforward.
Another possibility is writing a simple algorithm to track the parentheses in the string:
- Split the string at all parentheses, while returning the delimiter (e.g. using
re.split
) - Keep a counters tracking the parentheses:
s
for(
ande
for)
(as in start, end). - Using the counters, proceed by either splitting at white spaces or adding the current data into a temp var (
t
) - When the left most parenthesis has been closed, append
t
to the list of values & reset the counters/temp vars.
Here's an example:
import re
string = "1 2 3 (test 0, test 0) (test (0 test) 0)"
res, s, e, t = [], 0, 0, ""
for x in re.split(r"([()])", string):
if not x.strip():
continue
elif x == "(":
if s > 0:
t = "("
s = 1
elif x == ")":
e = 1
if e == s:
res.append(t)
e, s, t = 0, 0, ""
else:
t = ")"
elif s > e:
t = x
else:
res.extend(x.strip(" ").split(" "))
print(res)
# ['1', '2', '3', 'test 0, test 0', 'test (0 test) 0']
Not very elegant, but works.
CodePudding user response:
You can use pip install regex
and use
import regex
text = "1 2 3 (test 0, test 0) (test (0 test) 0)"
matches = [match.group() for match in regex.finditer(r"(?:(\((?>[^()] |(?1))*\))|\S) ", text)]
print(matches)
# => ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']
See the online Python demo. See the regex demo. The regex matches:
(?:
- start of a non-capturing group:(\((?>[^()] |(?1))*\))
- a text between any nested parentheses
|
- or\S
- any non-whitespace char
)
- end of the group, repeat one or more times