Loop through list of strings, remove all banned words from each string item-CodePudding

I have the following list:

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

This is a list of words that I want to remove from each of the string items in the list:

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

The resulting list that I am trying to generate is this:

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

So far, I have been unable to achieve this. My attempt is as follows:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

This returns:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I have also tried:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"\b("   "|".join(bannedWord)   ")\\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

This returns:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I'm a bit lost at this point and not sure where I'm going wrong. Any help is much appreciated.

CodePudding user response：

Some issues:

The final \W in your regex requires that there is a character that follows the banned word. So if the banned word is the last word in the input string, that will fail. You could just use \b again, like you did at the start of the regex
Since you wanted to replace the comma as well, you need to add it as an option. Make sure to not put it inside that same capture group, as then \\b at the end would require that comma to be followed by an alphanumerical character. So it should be put as an option right at the very end (or start) of your regex.
You might want to call .strip() on the resulting string to remove any white space that remains after the banned words have been removed.

So:

def RemoveBannedWords(ing):
    pattern = re.compile("\\b(grated|zested|thinly|chopped)\\b|,", re.I)
    return pattern.sub("", ing).strip()

CodePudding user response：

def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList is list that you will clear

bannedWords are words that you dont want

splitChar is charcther that is between the words (" ")

CodePudding user response：

I would remove , from bannedWord list and use str.strip to strip it:

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"\b"   "|".join(re.escape(w) for w in bannedWord)   r"\b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

Prints:

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots

CodePudding user response：

The below seems to work (a naive nested loop)

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output

['lemons', 'cheddar cheese', 'carrots']