I got a list of string and I tokenized it then I did a top 30 occurrences of each words in the list. then I found out there was some words that should be removed.
I got a list of key word I want to remove it from a list of string. for example:
z = ['if teardrops could be bottled', 'i dont wanna be you anymore', 'i love cats']
then I make it into a string and tokenize it here is the result: ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats'] I want to remove keyword 'cats' and 'I' from it so i created a list
zzz_rem = ['cats', 'i']
and here's code I did after that
zzz_result = [d for d in zzz if zzz not in zzz_rem]
zzz_resils = ' '.join(zzz_result)
but the output remain the same as: ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats']
a keyword list that I create isn't removed from that list. what should I do
p.s. I am new to python, thank you.
CodePudding user response:
One approach is to use a regular expression to remove the words from the sentences:
import re
z = ['if teardrops could be bottled', 'i dont wanna be you anymore', 'i love cats']
zzz_rem = ['cats', 'i']
pattern = re.compile(rf"(\b({'|'.join(zzz_rem)})\b)")
res = [pattern.sub("", zi).strip() for zi in z]
print(res)
Output
['if teardrops could be bottled', 'dont wanna be you anymore', 'love']
CodePudding user response:
You have defined the list:
zzz = ['if', 'teardrops', 'could', 'be', 'bottled', 'i', 'dont', 'wanna', 'be', 'you', 'anymore', 'i', 'love', 'cats']
Then you are checking if the list zzz is not in zzz_rem. What you actually want to check is that the element on the list is not in zzz_rem (d). So line:
zzz_result = [d for d in zzz if zzz not in zzz_rem]
Turns into:
zzz_result = [d for d in zzz if d not in zzz_rem]