I'm dealing with equations like 'x_{t 1} y_{t}=z_{t-1}'
. My objective is to obtain all "variables", that is, a list with x_{t 1}, y_{t}, z_{t-1}
.
I'd like to split the string by [ -=*/]
, but not if or - are inside {}
.
Something like this re.split('(?<!t)[\ \-\=]','x_{t 1} y_{t}=z_{t-1}')
partly does the job by not spliting if it observes t
followed by a symbol. But I'd like to be more general. Assume there are no nested brackets.
How can I do this?
CodePudding user response:
Instead of splitting at those characters, you could find sequences of all other characters (like x
and _
) and bracket parts (like {t 1}
). The first such sequence in the example is x
, _
, {t 1}
, i.e., the substring x_{t 1}
.
import re
s = 'x_{t 1} y_{t}=z_{t-1}'
print(re.findall(r'(?:\{.*?}|[^- =*/]) ', s))
Output (Try it online!):
['x_{t 1}', 'y_{t}', 'z_{t-1}']
CodePudding user response:
Instead of re.split
, consider using re.findall
to match only the variables:
>>> re.findall(r"[a-z0-9] (?:_\{[^\}] \})?","x_{t 1} y_{t}=z_{t-1} pi", re.IGNORECASE)
['x_{t 1}', 'y_{t}', 'z_{t-1}', 'pi']
Explanation of regex:
[a-z0-9] (?:_\{[^\}] \})?
[a-z0-9] : One or more alphanumeric characters
(?: )?: A non-capturing group, optional
_\{ \} : Underscore, and opening/closing brackets
[^\}] : One or more non-close-bracket characters