Home > Enterprise >  regex issues with replacing placeholders dictionary key values
regex issues with replacing placeholders dictionary key values

Time:04-20

This is with the reference to question: Replacing placeholders with dictionary keys/values

I have placeholders (the same as in the referenced question except the last one). There I need to replace placeholder $fil_TABLE_NAME1, where $fil_ stays the same but table name differs (split with underscores, can contain numbers)

placeholders = {r'\$plc_hldr1': '1111',
                r'\$plc_hldr2': 'abcd',
                r'\$\d*date_placeholder': '20200101',
                r'\$fil_\w ': '(select * from table)'
                }

For replacement I'm using the adjusted code from the referenced question

def remove_escape_chars(reggie):
     return re.sub(r'\\\$\\d\*|\$\d*|\\\$fil\\\_\\\w\\\ |\\', '', reggie)   #modification

def multiple_replace(escape_dict, text):
   # Create a second dictionary to lookup regex match replacement targets
   unescaped_placeholders = { remove_escape_chars(k): placeholders[k] for k in placeholders }

   # Create a regular expression from all of the dictionary keys
   regex = re.compile("|".join(escape_dict.keys()))
   return regex.sub(lambda match: unescaped_placeholders[remove_escape_chars(match.group(0))], text)

But when I execute it with

text = "sometext $fil_SAMPLE_TABLE_NAME some more text $plc_hldr2 some more more text 
1234date_placeholder some text $5678date_placeholder"

result = multiple_replace(placeholders, text)
print(result)

I get sometext $fil_SAMPLE_TABLE_NAME some more text abcd some more more text 20200101 some text 20200101 - $fil_SAMPLE_TABLE_NAME is not replaced.

I think I have some issue in regular expression, maybe something incorrectly escaped, but after several modifications, I was not able to find the issue.

Would anybody help me please?

CodePudding user response:

I would take a slightly different approach to this. Rather than trying to match the regex which matched part of the string, create a regex which has each individual regex in its own group, and then use the matching group number to look up the replacement value. For your sample data, the regex would look like this:

(\$plc_hldr1)|(\$plc_hldr2)|(\$\d*date_placeholder)|(\$fil_\w )

and the python code would then be:

placeholders = {r'\$plc_hldr1': '1111',
                r'\$plc_hldr2': 'abcd',
                r'\$\d*date_placeholder': '20200101',
                r'\$fil_\w ': '(select * from table)'
                }
replacements = list(placeholders.values())

text = "sometext $fil_SAMPLE_TABLE_NAME some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder"

regex = re.compile('('   ')|('.join(placeholders.keys())   ')')
regex.sub(lambda m: replacements[m.lastindex-1], text)

Output:

sometext (select * from table) some more text abcd some more more text 20200101 some text 20200101

Note that this requires that any group in any of the placeholder regexes needs to be non-capturing i.e. (?:...) rather than (...).

  • Related