I am trying to replace a certain set of words in a list with words from a different list.
- Check "s"
- If words in "invalid_list" are in "s" it should be replaced with xyz
The outcome for "s" should be :
['123xyz', '456xyz', '789xyz']
s = ['123xyz', '456xye','789xyf']
invalid_list = ['xye','xyf']
for i in invalid_list:
if i in s:
s = s.replace(i, 'xyz')
print(s)
Current (invalid) output:
['123xyz', '456xye', '789xyf']
CodePudding user response:
Iterate over the invalid_list and use the in-built replace() function to replace the substring.
for i in invalid_list:
s = [string.replace(i, 'xyz') for string in s]
CodePudding user response:
You need to have another loop to pull out each string individually, and then your can have your loop to check if any of the invalid strings are there.
Plus you need to reassign the changed string back into the list.
s = ['123xyz', '456xye','789xyf']
invalid_list = ['xye','xyf']
for index,element in enumerate(s):
for i in invalid_list:
if i in element:
element = element.replace(i, 'xyz')
s[index] = element
print(s)
Output as requested
CodePudding user response:
i in s
looks for an exact match, not a substring. And list.replace()
also replaces an exact match, it doesn't replace substrings.
You can write a list comprehension to create the updated list. Move the code that replaces all the invalid strings into a function that you can call from the list comprehension.
def replace_invalid(string, invalid, replacement):
for substring in invalid:
string = string.replace(substring, replacement)
return s
s = [replace_invalid(item, invalid_list, 'xyz') for item in s]
CodePudding user response:
Looping over the items of invalid_list
is inefficient. This increases the complexity of the algorithm.
An efficient solution would be to use a regex to search motifs in each string only once:
s = ['123xyz', '456xye','789xyf']
invalid_list = ['xye','xyf']
import re
regex = re.compile('|'.join(map(re.escape, invalid_list)))
s2 = [regex.sub('xyz', x) for x in s]
Output:
['123xyz', '456xyz', '789xyz']
avoid matching partial words:
s = ['123xyz', '456xye','789xyf']
invalid_list = ['xy','xye','xyf']
import re
regex = re.compile(f"({'|'.join(map(re.escape, invalid_list))})\b")
s2 = [regex.sub('xyz', x) for x in s]
# ['123xyz', '456xye', '789xyf']