Home > Mobile >  regex: split by except if inside a brackets
regex: split by except if inside a brackets

Time:07-12

I'm dealing with equations like 'x_{t 1} y_{t}=z_{t-1}'. My objective is to obtain all "variables", that is, a list with x_{t 1}, y_{t}, z_{t-1}.

I'd like to split the string by [ -=*/], but not if or - are inside {}.

Something like this re.split('(?<!t)[\ \-\=]','x_{t 1} y_{t}=z_{t-1}') partly does the job by not spliting if it observes t followed by a symbol. But I'd like to be more general. Assume there are no nested brackets.

How can I do this?

CodePudding user response:

Instead of splitting at those characters, you could find sequences of all other characters (like x and _) and bracket parts (like {t 1}). The first such sequence in the example is x, _, {t 1}, i.e., the substring x_{t 1}.

import re

s = 'x_{t 1} y_{t}=z_{t-1}'

print(re.findall(r'(?:\{.*?}|[^- =*/]) ', s))

Output (Try it online!):

['x_{t 1}', 'y_{t}', 'z_{t-1}']

CodePudding user response:

Instead of re.split, consider using re.findall to match only the variables:

>>> re.findall(r"[a-z0-9] (?:_\{[^\}] \})?","x_{t 1} y_{t}=z_{t-1} pi", re.IGNORECASE)
['x_{t 1}', 'y_{t}', 'z_{t-1}', 'pi']

Try online

Explanation of regex:

[a-z0-9] (?:_\{[^\}] \})?
[a-z0-9]                 : One or more alphanumeric characters
         (?:           )?: A non-capturing group, optional
            _\{      \}  : Underscore, and opening/closing brackets
               [^\}]     : One or more non-close-bracket characters
  • Related