Home > Software design >  Regex to handle a varying number of variables
Regex to handle a varying number of variables

Time:09-16

I'm trying to change a string that looks something like this:

s = 'g1 & g2 & (X~(~g1 & ~g2) & ~o1) & (XX~(~g1 & ~g2) & ~o1 & X~o1)'

to this:

'g1_0 & g2_0 & (~(~g1_1 & ~g2_1) & ~o1_0) & (~(~g1_2 & ~g2_2) & ~o1_0 & ~o1_1)'

So basically I'm appending to each variable _# (underscore and number) as the number of X's in front of it and remove the X's. The problem mostly arises when the X's are before parentheses and that I do not know a-priori how many variables and the logical operators that are in parentheses.

I try to do this in Python. I am going backwards from the most number of X's (Because, if I start looking for g1's, all of them will change). So this is the sequence:

import re
xs = 'X'*n
while n>0:
  # this is for when we have parentheses
  s = re.sub('%s([~]*)([(] [~]*[a-zA-Z] [0-9] ) ([&|]*) ([~]*[a-zA-Z] [0-9] )([)] )'%xs, \
                          r'\1\2_%d \3 \4_%d\5'%(n,n), s)
  # this is for normal variables
  s = re.sub('%s([~]*[a-zA-Z]*[0-9]*)'%xs, r'\1_%d'%n, s) 
  xs = xs[:-1]
  n -= 1

And going down to no X's. The problem is that I don't want to impose the structure of 'o/g &/| o/g'. and I want it to be variable-length of names and operators, but still assign the correct names. E.g., to handle:

XX(~g1 & ~g2 | ~k3)  --> (~g1_2 & ~g2_2 | ~k3_2)

How can I do it with Regex?

CodePudding user response:

You can use recursion with re:

import re
def rep_x(d, c = 0):
   s, f = '', 0
   while d:
      if d[0] == ')':
         return s ')', d[1:]
      if d[0] == '(':
         [_s, d], f = rep_x(d[1:], c = c f), 0
         s  = '(' _s
      elif (x:=re.findall('^X ', d)):
         d = d[(f:=len(x[0])):]
      elif (x:=re.findall('^\w ', d)):
         s, f, d = s   x[0] '_' str(f c), 0, d[len(x[0]):]
      else:
         s, d = s d[0], d[1:]
   return s, d

r1, _ = rep_x('g1 & g2 & (X~(~g1 & ~g2) & ~o1) & (XX~(~g1 & ~g2) & ~o1 & X~o1)') 
r2, _ = rep_x('XX(~g1 & ~g2 | ~k3)')          

Output:

'g1_0 & g2_0 & (~(~g1_1 & ~g2_1) & ~o1_0) & (~(~g1_2 & ~g2_2) & ~o1_0 & ~o1_1)'
'(~g1_2 & ~g2_2 | ~k3_2)'
  • Related