I have the following list:
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
This is a list of words that I want to remove from each of the string items in the list:
bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']
The resulting list that I am trying to generate is this:
cleaner_list = ["lemons", "cheddar cheese", "carrots"]
So far, I have been unable to achieve this. My attempt is as follows:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
return pattern.sub("", ing)
for ing in dirtylist:
cleaner_ing = RemoveBannedWords(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
This returns:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
I have also tried:
import re
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b(" "|".join(bannedWord) ")\\W", re.I)
def remove_words(ing):
global re_banned_words
return re_banned_words.sub("", ing)
for ing in dirtylist:
cleaner_ing = remove_words(ing)
cleaner_list.append(cleaner_ing)
print(cleaner_list)
This returns:
['lemons zested', 'cheddar cheese', 'carrots, chopped']
I'm a bit lost at this point and not sure where I'm going wrong. Any help is much appreciated.
CodePudding user response:
Some issues:
The final
\W
in your regex requires that there is a character that follows the banned word. So if the banned word is the last word in the input string, that will fail. You could just use\b
again, like you did at the start of the regexSince you wanted to replace the comma as well, you need to add it as an option. Make sure to not put it inside that same capture group, as then
\\b
at the end would require that comma to be followed by an alphanumerical character. So it should be put as an option right at the very end (or start) of your regex.You might want to call
.strip()
on the resulting string to remove any white space that remains after the banned words have been removed.
So:
def RemoveBannedWords(ing):
pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
return pattern.sub("", ing).strip()
CodePudding user response:
def clearList(dirtyList, bannedWords, splitChar):
clean = []
for dirty in dirtyList:
ban = False
for w in dirty.split():
if w in bannedWords:
ban = True
if ban is False:
clean.append(dirty)
return clean
dirtyList is list that you will clear
bannedWords are words that you dont want
splitChar is charcther that is between the words (" ")
CodePudding user response:
I would remove ,
from bannedWord
list and use str.strip
to strip it:
import re
dirtylist = [
"lemons zested",
"grated cheddar cheese",
"carrots, thinly chopped",
]
bannedWord = ["grated", "zested", "thinly", "chopped"]
pat = re.compile(
r"\b" "|".join(re.escape(w) for w in bannedWord) r"\b", flags=re.I
)
for w in dirtylist:
print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))
Prints:
lemons zested lemons
grated cheddar cheese cheddar cheese
carrots, thinly chopped carrots
CodePudding user response:
The below seems to work (a naive nested loop)
dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
temp = words
for bannedWord in bannedWords:
temp = temp.replace(bannedWord, '')
result.append(temp.strip())
print(result)
output
['lemons', 'cheddar cheese', 'carrots']