Home > Enterprise >  Find and replace a string in a sentence to another string in a list of sentences using a dictionary
Find and replace a string in a sentence to another string in a list of sentences using a dictionary

Time:05-11

I have hundreds of thousands of original sentences and a lookup table in the form of a dictionary. I need to find all keys in all sentences and replace them as the value of the corresponding key.

For example, the original sentences and the lookup table are

sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York',
 'Between Paris and New York'] 

lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'} 

The desired result is as follows.

['Korea is a beautiful place', 'I want to visit France', 'United States United States', 
'Between France and United States']

What I tried is the below code.

for i in range(len(sentences)):
    sentence1 = sentences[I]
    for key in lookup.keys():
        sentence1 = sentence1.replace(key, lookup[key])
    sentences[i] = sentence1

I'm concerned that double loops may take too much time. Is it the best way to do this? Is there a faster or more elegant way to accomplish this?

CodePudding user response:

You could use re.sub with a callback function. Form a regex alternation of the city keys, and then do the lookup in the callback.

sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York', 'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
regex = r'\b(?:'   r'|'.join([re.escape(x) for x in lookup.keys()])   r')\b'
output = [re.sub(regex, lambda m: lookup[m.group()], x) for x in sentences]
print(output)

This prints:

['Korea is a beautiful place',
 'I want to visit France',
 'United States United States',
 'Between France and United States']

CodePudding user response:

You just have to loop trough all sentences then replace every element:

sentences_corrected = []
for sentence in sentences:
    for key, substitution in lookup.items():
        sentence = sentence.replace(key, substitution)
    sentences_corrected.append(sentence) 
  • Related