Home > Back-end >  Python Updating a regex dictionary using the previous expression/key?
Python Updating a regex dictionary using the previous expression/key?

Time:10-16

I have this function that replaces a named rule in the value of a dictionary, like #digit#, with its regular expression counterpart: [0-9)

So for example, calling this function:

def expand_re(pat_dict:{str:str}):
    pat_list = list(pat_dict.items())
    for key, rule in pat_dict.items():
        expression = re.compile(r'#\w #')
        pat_dict[key] = re.sub(expression, f"(?:{pat_list[0][1]})", rule)
    return pat_dict

on this dictionary: pd = dict(digit = r'[0-9]', integer = r'[ -]?#digit##digit#*') Would produce: {'digit': '[0-9]', 'integer': '[ -]?(?:[0-9])(?:[0-9])*'}

And it works just fine for that. However, if a dictionary starts using the previous value's named rule, it does not work. So calling expand_re on the dictionary:

pd = dict(a='correct',b='#a#',c='#b#',d='#c#',e='#d#',f='#e#',g='#f#')

My function produces:

{'a': 'correct',
 'b': '(?:correct)',
 'c': '(?:correct)',
 'd': '(?:correct)',
 'e': '(?:correct)',
 'f': '(?:correct)',
 'g': '(?:correct)'
}

When I want it to produce:

{'a': 'correct',
 'b': '(?:correct)',
 'c': '(?:(?:correct))',
 'd': '(?:(?:(?:correct)))',
 'e': '(?:(?:(?:(?:correct))))',
 'f': '(?:(?:(?:(?:(?:correct)))))',
 'g': '(?:(?:(?:(?:(?:(?:correct))))))'
}

How would I be able to do this? I've tried using the dictionary's .update() method, but to no avail.

CodePudding user response:

Here is a fix of your code which works as you want. Be aware though that the logic is the same and it has the flaw that the keys are replaced in order. This means it won't do what you want if the keys are shuffled.

import re
def expand_re(pat_dict:{str:str}): 
    expression = re.compile(r'#(\w )#')
    for key in pat_dict:
        pat_dict[key] = re.sub(expression, lambda x: f"(?:{pat_dict[x.group(1)]})", pat_dict[key])
    return pat_dict

d = dict(a='correct',b='#a#',c='#b#',d='#c#',e='#d#',f='#e#',g='#f#')

expand_re(d)

Output:

{'a': 'correct',
 'b': '(?:correct)',
 'c': '(?:(?:correct))',
 'd': '(?:(?:(?:correct)))',
 'e': '(?:(?:(?:(?:correct))))',
 'f': '(?:(?:(?:(?:(?:correct)))))',
 'g': '(?:(?:(?:(?:(?:(?:correct))))))'}

example demonstrating the (potential) flaw in the logic

>>> expand_re(dict(a='correct',b='#a#',c='#b#',e='#d#',d='#c#',f='#e#',g='#f#'))
{'a': 'correct',
 'b': '(?:correct)',
 'c': '(?:(?:correct))',
 'e': '(?:#c#)',
 'd': '(?:(?:(?:correct)))',
 'f': '(?:(?:#c#))',
 'g': '(?:(?:(?:#c#)))'}

version without regex

def expand_re(pat_dict:{str:str}): 
    for key, value in pat_dict.items():
        if value.startswith('#') and value.endswith('#') and value[1:-1] in pat_dict:
            pat_dict[key] = f'(?:{pat_dict[value[1:-1]]}'
    return pat_dict
  • Related