In python, I am trying to use re.sub with a dict to replace multi "exact" substrings. The following is a sample example.
import re
words = "apple pineapple cat category data old_data"
dic = {"apple":"apple_new", "cat":"cat_new", "data":"data_new"}
pattern = re.compile("|".join(dic.keys()))
new_words = re.sub(pattern, lambda m: dic[m.group(0)], words)
The result is :
apple_new pineapple_new cat_new cat_newegory data_new old_data_new
The result I expected is :
apple_new pineapple cat_new category data_new old_data
What should I do? I have tried Replace exact substring in python methods (r"\b...\b")
new_words2 = re.sub(r"\bapple\b", "apple_new", words)
It can reach my goals. But in my real project, I have to replace about 100 patterns at once. So I hope can use dict with r"\bapple\b"
If you know how to do, please tell me, thanks.
CodePudding user response:
You can modify pattern
definition to:
pattern = re.compile(fr'\b(?:{"|".join(dic.keys())})\b')
# re.compile(r'\b(?:apple|cat|data)\b', re.UNICODE)
output of re.sub(pattern, lambda m: dic[m.group(0)], words)
:
'apple_new pineapple cat_new category data_new old_data'
CodePudding user response:
We can try placing word boundaries around your regex alternation, to prevent accidental unwanted substring matches. Also, strictly speaking we should also use re.escape
on each entry in the alteration.
import re
words = "apple pineapple cat category data old_data"
dic = {"apple":"apple_new", "cat":"cat_new", "data":"data_new"}
pattern = r'\b(' r'|'.join([re.escape(x) for x in list(dic.keys())]) r')\b'
new_words = re.sub(pattern, lambda m: dic[m.group(0)], words)
print(new_words) # apple_new pineapple cat_new category data_new old_data