Home > OS >  Splitting string, ignoring brackets including nested brackets
Splitting string, ignoring brackets including nested brackets

Time:12-04

I would like to split a string at spaces (and colons), except inside curly brackets and rounded brackets. Similar questions have been asked, but the answers fail with nested brackets.

Here is an example of a string to split:

p1: I/out   p2: (('mean', 5), 0.0, ('std', 2))   p3: 7   p4: {'name': 'check', 'value': 80.0}

The actual goal is to obtain a list of keys (p1, p2, p3 and p4) along with their values. When I try to split the string at spaces and colons, I can avoid splitting at spaces and colons inside the curly brackets. But I cannot avoid the splitting at some spaces inside the rounded brackets because of the nested brackets.

The closest I got is

[\s:] (?=[^\{\(\)\}]*(?:[\{\(]|$))

which is fine except that it splits between (('mean', 5), and 0.0.

CodePudding user response:

You can use the following PCRE/Python PyPi regex compliant pattern:

(?:(\((?:[^()]  |(?1))*\))|(\{(?:[^{}]  |(?2))*})|[^\s:]) 

See the regex demo.

It matches

  • (?: - start of a container non-capturing group:
    • (\((?:[^()] |(?1))*\)) - Group 1: a substring between two nested round brackets
    • | - or
    • (\{(?:[^{}] |(?2))*}) - Group 2: a substring between two nested braces
    • | - or
    • [^\s:] - a char other than whitespace and colon
  • ) - one or more occurrences.

See the Python demo:

import regex
text = "p1: I/out   p2: (('mean', 5), 0.0, ('std', 2))   p3: 7   p4: {'name': 'check', 'value': 80.0}"
pattern = r"(?:(\((?:[^()]  |(?1))*\))|(\{(?:[^{}]  |(?2))*})|[^\s:]) "
print( [x.group() for x in regex.finditer(pattern, text)] )

Output:

['p1', 'I/out', 'p2', "(('mean', 5), 0.0, ('std', 2))", 'p3', '7', 'p4', "{'name': 'check', 'value': 80.0}"]
  • Related