I have hundreds of thousands of original sentences and a lookup table in the form of a dictionary. I need to find all keys in all sentences and replace them as the value of the corresponding key.
For example, the original sentences and the lookup table are
sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York',
'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
The desired result is as follows.
['Korea is a beautiful place', 'I want to visit France', 'United States United States',
'Between France and United States']
What I tried is the below code.
for i in range(len(sentences)):
sentence1 = sentences[I]
for key in lookup.keys():
sentence1 = sentence1.replace(key, lookup[key])
sentences[i] = sentence1
I'm concerned that double loops may take too much time. Is it the best way to do this? Is there a faster or more elegant way to accomplish this?
CodePudding user response:
You could use re.sub
with a callback function. Form a regex alternation of the city keys, and then do the lookup in the callback.
sentences = ['Seoul is a beautiful place', 'I want to visit Paris', 'New York New York', 'Between Paris and New York']
lookup = {'Paris': 'France', 'New York': 'United States', 'Seoul': 'Korea'}
regex = r'\b(?:' r'|'.join([re.escape(x) for x in lookup.keys()]) r')\b'
output = [re.sub(regex, lambda m: lookup[m.group()], x) for x in sentences]
print(output)
This prints:
['Korea is a beautiful place',
'I want to visit France',
'United States United States',
'Between France and United States']
CodePudding user response:
You just have to loop trough all sentences then replace every element:
sentences_corrected = []
for sentence in sentences:
for key, substitution in lookup.items():
sentence = sentence.replace(key, substitution)
sentences_corrected.append(sentence)