Home > Blockchain >  Unable to implement nltk.stopwords
Unable to implement nltk.stopwords

Time:07-18

I am trying to remove stopwords in my data with nltk, but after several attempts I am unable to remove the stopwords. The tokenization part of my code works, but I am unable to understand why stopwords does not work.

def pre_process(text):
    
    # remove special characters and digits
    text=re.sub("(\\d|\\W|_) "," ",text)
    text=re.split("\W ",text)
    
    return text
text = dat['text'].apply(lambda x:pre_process(x))
nltk.download('stopwords')

def remove_stopwords(text):
    for word in text:
        if word in stopwords.words('english'):
            text.remove(word)
        return text

text_stopword = text.apply(lambda x:remove_stopwords(x))

The code should remove words such as 'the', but after running my csv through the code, that words such as 'the' is still present.

Current results:

text returns:

[tv, future, in, the, hands, of, viewers, with...

text_stopword returns:

[tv, future, in, the, hands, of, viewers, with...

CodePudding user response:

Your return statement in remove_stopwords function is wrongly indented. Due to that function returns text right after the first iteration.

Please go with:

def remove_stopwords(text):
    for word in text:
        if word in stopwords.words('english'):
            text.remove(word)
    return text

  • Related