Suppose I have the following dictionary:
data = {'ACCOUNT_CLOSURE': ['account closure',
'close account',
'close bank',
'terminate account',
'account deletion',
'cancel account',
'account cancellation'],
'ACCOUNT_CHANGE': ['change my account',
'switch my account',
'change from private into savings',
'convert into family package',
'change title of the account',
'make title account to family',
'help me access the documentation']}
I want to go through each key and subsequently the elements of the values and drop the stopwords, so I do:
stop_words = set(stopwords.words("english"))
for key, values in data.items():
data[key] = [value for value in values if value not in stop_words]
but this returns the exact same dictionary as my original one. I wonder what am I doing wrong?
CodePudding user response:
You are using the set of stopwords from the nltk library, which only contains words and not phrases. You need to filter the words in each value phrase instead of the entire value. Try this code:
for key, values in data.items():
data[key] = [
" ".join([word for word in value.split() if word not in stop_words])
for value in values
]