I would like to split a string at spaces (and colons), except inside curly brackets and rounded brackets. Similar questions have been asked, but the answers fail with nested brackets.
Here is an example of a string to split:
p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}
The actual goal is to obtain a list of keys (p1, p2, p3 and p4) along with their values. When I try to split the string at spaces and colons, I can avoid splitting at spaces and colons inside the curly brackets. But I cannot avoid the splitting at some spaces inside the rounded brackets because of the nested brackets.
The closest I got is
[\s:] (?=[^\{\(\)\}]*(?:[\{\(]|$))
which is fine except that it splits between (('mean', 5),
and 0.0
.
CodePudding user response:
You can use the following PCRE/Python PyPi regex compliant pattern:
(?:(\((?:[^()] |(?1))*\))|(\{(?:[^{}] |(?2))*})|[^\s:])
See the regex demo.
It matches
(?:
- start of a container non-capturing group:(\((?:[^()] |(?1))*\))
- Group 1: a substring between two nested round brackets|
- or(\{(?:[^{}] |(?2))*})
- Group 2: a substring between two nested braces|
- or[^\s:]
- a char other than whitespace and colon
)
- one or more occurrences.
See the Python demo:
import regex
text = "p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}"
pattern = r"(?:(\((?:[^()] |(?1))*\))|(\{(?:[^{}] |(?2))*})|[^\s:]) "
print( [x.group() for x in regex.finditer(pattern, text)] )
Output:
['p1', 'I/out', 'p2', "(('mean', 5), 0.0, ('std', 2))", 'p3', '7', 'p4', "{'name': 'check', 'value': 80.0}"]