Home > Software design >  How to replace full words versus substrings
How to replace full words versus substrings

Time:11-11

I am working on a problem where I am using a dictionary to replace certain words in a string.

This is my code for a minimal example:

dictionary = {"happy": 'YAY!', "happybday": "PARTY!"}

string = "There were so many happy people exclaiming happybday!"

for old_word, new_word in dictionary.items():
    string = string.replace(old_word, new_word)

print(string)

The problem here is I am getting the output:

There were so many YAY! people exclaiming YAY!bday!

Desired output:

There were so many YAY! people exclaiming PARTY!!

Clearly what is happening is that while iterating through each element in the dictionary, it is first seeing "happy" and then seeing that as a substring and replacing each instance of the substring. Where this is a problem is that in the second case, I the substring forms a larger string which should be replaced. However, when I iterate through each element in the dictionary it is only looking at the substring level.

Does anyone have any ideas on how I might be able to fix this? I thought perhaps either reordering the items in the dictionary (so that the larger stings come first), but this does not seem like the best solution. I could also just simply split the string into a list of words and try to compare and replace based on this, but since this might take more time, I though that might not be the best solution either.

I think part of the problem might be occurring because of using the "in" keyword here, but I am not entirety sure.

Any advice? Any links to explanations, tutorials, or examples would be greatly appreciated. I am really trying to understand not only what to fix but what is wrong in the first place. Thanks.

CodePudding user response:

You may build a regex alternation of search keys, sorted descending by length such that longer, more specific terms, will be searched first.

dictionary = {"happy": 'YAY!', "happybday": "PARTY!"}
string = "There were so many happy people exclaiming happybday!"

regex = r'\b(?:'   r'|'.join(sorted(dictionary.keys(), key=len, reverse=True))   r')\b'
output = re.sub(regex, lambda m: dictionary[m.group()], string)
print(output)

# There were so many YAY! people exclaiming PARTY!!

The significance of searching for longer keys first before shorter ones is that when we encounter happybday, we want to match it before matching happy.

CodePudding user response:

You can order the processing by the assumed order of precedence. In your example it appears that length of the key should be enough. (from longest to shortest)

dictionary = {"happy": 'YAY!', "happybday": "PARTY!"}

string = "There were so many happy people exclaiming happybday!"

def by_length(item):    
    key, value = item
    return len(key)

for old_word, new_word in sorted(dictionary.items(), key=by_length, reverse=True):
    string = string.replace(old_word, new_word)
  • Related