Home > Net >  Update the set of stop words by adding the custom stop words
Update the set of stop words by adding the custom stop words

Time:04-13

We are provided with a default set of stop words and we need to add some extra set of custom words and remove these words from the given sentence and obtain the sentence without the stop words.

I tried this but got output as NONE. Please Help!

sentence = 'Hello, good morning folks! Today we will announce the half yearly performance results of the company. Due to the ongoing COVID-19 pandemic, our profits have declined by 60% as compared to the last year'

stop_words = { "i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"}

    custom_stop_words = ["hello","good","morning","half","year"]
    updated_stop_words= list(stop_words).append(custom_stop_words)

    print(updated_stop_words)


    **Output:
    NONE**

CodePudding user response:

The problem is list.append modifies the list in-place but you have not assigned list(stop_words) to anything, so there is no variable to use afterwards.

One way you could add the custom words to the existing stop words is to assign the list of stop_words to a variable, then use list.extend:

updated_stop_words= list(stop_words)
updated_stop_words.extend(custom_stop_words)

However, since you will test for set membership in a comprehension later on, it seems more appropriate for updated_stop_words to be a set. Since stop_words is already a set, you can add custom_stop_words to it by using the union operator.

Then in a loop, you can check if a word is in updated_stop_words and exclude it if it is.

import string
updated_stop_words= stop_words | set(custom_stop_words)
out = [w for w in sentence.split() if w.lower().rstrip(string.punctuation) not in updated_stop_words]

Output:

['folks!', 'Today', 'announce', 'yearly', 'performance', 'results', 
 'company.', 'Due', 'ongoing', 'COVID-19', 'pandemic,', 'profits', 
 'declined', '60%', 'compared', 'last']

CodePudding user response:

The method append actually returns None. It appends an object to the original list. So, instead of assigning the result to another variable, just use "append" and print the original list.

Also, I recommend using extend instead of append so you would add the elements to the stop words as strings instead of appending the whole list.

Try the code this way:

sentence = 'Hello, good morning folks! Today we will announce the half yearly performance results of the company. Due to the ongoing COVID-19 pandemic, our profits have declined by 60% as compared to the last year'

stop_words = [ "i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

custom_stop_words = ["hello","good","morning","half","year"]
stop_words.extend(custom_stop_words)

print(stop_words)

Note:

There is a missing double quote before the word himself in your stop words dictionary. You also need to remove the indentation before the lines after it. Given that your code runs successfully, I guess these problems are only typos in the question not in the code you run.

  • Related