Home > Blockchain >  python use re.sub with a dict to replace multi exact substrings
python use re.sub with a dict to replace multi exact substrings

Time:09-13

In python, I am trying to use re.sub with a dict to replace multi "exact" substrings. The following is a sample example.

import re

words = "apple pineapple cat category data old_data"
dic = {"apple":"apple_new", "cat":"cat_new", "data":"data_new"}

pattern = re.compile("|".join(dic.keys()))

new_words = re.sub(pattern, lambda m: dic[m.group(0)], words)

The result is :

apple_new pineapple_new cat_new cat_newegory data_new old_data_new

The result I expected is :

apple_new pineapple cat_new category data_new old_data

What should I do? I have tried Replace exact substring in python methods (r"\b...\b")

new_words2 = re.sub(r"\bapple\b", "apple_new", words)

It can reach my goals. But in my real project, I have to replace about 100 patterns at once. So I hope can use dict with r"\bapple\b"

If you know how to do, please tell me, thanks.

CodePudding user response:

You can modify pattern definition to:

pattern = re.compile(fr'\b(?:{"|".join(dic.keys())})\b')
# re.compile(r'\b(?:apple|cat|data)\b', re.UNICODE)

output of re.sub(pattern, lambda m: dic[m.group(0)], words):

'apple_new pineapple cat_new category data_new old_data'

CodePudding user response:

We can try placing word boundaries around your regex alternation, to prevent accidental unwanted substring matches. Also, strictly speaking we should also use re.escape on each entry in the alteration.

import re

words = "apple pineapple cat category data old_data"
dic = {"apple":"apple_new", "cat":"cat_new", "data":"data_new"}

pattern = r'\b('   r'|'.join([re.escape(x) for x in list(dic.keys())])   r')\b'
new_words = re.sub(pattern, lambda m: dic[m.group(0)], words)
print(new_words)  # apple_new pineapple cat_new category data_new old_data
  • Related