how to remove a keyword from a list of string in python-CodePudding

I got a list of string and I tokenized it then I did a top 30 occurrences of each words in the list. then I found out there was some words that should be removed.

I got a list of key word I want to remove it from a list of string. for example:

z = ['if teardrops could be bottled', 'i dont wanna be you anymore', 'i love cats']

then I make it into a string and tokenize it here is the result: ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats'] I want to remove keyword 'cats' and 'I' from it so i created a list

zzz_rem = ['cats', 'i']

and here's code I did after that

zzz_result = [d for d in zzz if zzz not in zzz_rem]
zzz_resils = ' '.join(zzz_result)

but the output remain the same as: ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats']

a keyword list that I create isn't removed from that list. what should I do

p.s. I am new to python, thank you.

CodePudding user response：

One approach is to use a regular expression to remove the words from the sentences:

import re

z = ['if teardrops could be bottled', 'i dont wanna be you anymore', 'i love cats']

zzz_rem = ['cats', 'i']

pattern = re.compile(rf"(\b({'|'.join(zzz_rem)})\b)")

res = [pattern.sub("", zi).strip() for zi in z]
print(res)

Output

['if teardrops could be bottled', 'dont wanna be you anymore', 'love']

CodePudding user response：

You have defined the list:

zzz = ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats']

Then you are checking if the list zzz is not in zzz_rem. What you actually want to check is that the element on the list is not in zzz_rem (d). So line:

zzz_result = [d for d in zzz if zzz not in zzz_rem]

Turns into:

zzz_result = [d for d in zzz if d not in zzz_rem]