I want to check in a Python program if a given english sentence contains all non-meaning words.
Return true if sentence has all words that have no meaning
e.g. sdfsdf sdf ssdf fsdf dsd sd
Return false if sentence contains at least one word that has meaning
e.g. Hello asdf
Here is the code I wrote.
import nltk
nltk.download('words')
from nltk.corpus import words
def is_sentence_meaningless(sentence):
is_meaningless = False
for word in sentence.split():
if(word in words.words()):
is_meaningless = True
break
return is_meaningless
print(is_sentence_meaningless("sss sss asdfasdf asdfasdfa asdfasfsd"))
print(is_sentence_meaningless(" sss sss asdfasdf asdfasdfa asdfasfsd TEST"))
Is there a better alternative to this code? Also, how can I add my own corpus to it? For example I have few domain specific words that I want it to return as true, is that possible?
CodePudding user response:
You can use set.difference
method (note that since words in nltk.corpus.words
are mostly in lower case, have to use str.lower
method as well, e.g. "hello" is in but "Hello" isn't):
def is_sentence_meaningless(sentence, domain_specific_words):
s_set = set(sentence.lower().split())
if s_set.difference(words.words() domain_specific_words) == s_set:
return True
return False
Just FYI but your function does not do what your explanation says.
CodePudding user response:
Given that the word list contains only unique words, the function can be made more efficient by converting the list to a set.
Also, your logic doesn't seem to align with the implied purpose of the function (based on its name). A sentence would be meaningless if any of the words in the sentence are not found in the corpus set.
There is a considerable overhead in converting the word list to a set. Therefore, if the function is going to be used multiple times, it would be better to wrap it in a class.
Thus:
import nltk.corpus
class sentence_checker:
def __init__(self):
self.words = set(nltk.corpus.words.words())
def is_sentence_meaningless(self, sentence):
for word in sentence.split():
if not word in self.words:
return True
return False
sc = sentence_checker()
print(sc.is_sentence_meaningless('hello'))
print(sc.is_sentence_meaningless('hellfffo'))