I am trying to use spaCy in Python to detect the word "grief" no matter the form, whether it is "I am grieving", "going through grief.""I grieved over __", if it's in all caps, etc. I am pretty new to python so I don't know lemmatization that well, but is there some simple if statements that could solve it using spaCy?
grief = str(input(("What is currently on your mind? ")))
doc = nlp(grief)
if [t.grief for t in doc if t.lemma_ == "grie"]:
grief1(sad_value)
CodePudding user response:
There are two lemmas you need to check for: "grief" and "grieve". Here is a solution that makes use of the spaCy
lemmatiser:
import spacy
nlp = spacy.load('en_core_web_sm', exclude=["ner"])
grief = str(input(("What is currently on your mind? ")))
# Input: "I am grieving"
doc = nlp(grief)
for t in doc:
if t.lemma_ == "grief" or t.lemma_ == "grieve":
print("Found {}".format(t.lemma_))
# Output: "Found grieve"
Examples for testing
import spacy
nlp = spacy.load('en_core_web_sm', exclude=["ner"])
texts = ["I am grieving", "Going through grief", "I will grieve", "I grieved", "He grieves"]
docs = list(nlp.pipe(texts))
for doc in docs:
print(doc.text)
for t in doc:
if t.lemma_ == "grief" or t.lemma_ == "grieve":
print("\t-> Found {}".format(t.lemma_))
# Output
# I am grieving
# -> Found grieve
# Going through grief
# -> Found grief
# I will grieve
# -> Found grieve
# I grieved
# -> Found grieve
# He grieves
# -> Found grieve
Alternatively, you can also use Stemming via the SnowballStemmer
implementation:
from nltk.stem.snowball import SnowballStemmer
from nltk.tokenize import word_tokenize
stemmer = SnowballStemmer(language='english')
grief = str(input(("What is currently on your mind? ")))
for token in word_tokenize(grief):
stem = stemmer.stem(token)
if stem == 'grief' or stem == 'griev':
print("Found {}".format(stem))