I am having difficulties passing a dataframe column through the SpaCy Dependency Matcher. I attempted to modify the solution found in a pervious question, 'Spacy Dependency Parsing with Pandas dataframe' but no luck.
import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN
nlp = spacy.load("en_core_web_lg")
text = 'REPAIRED CONNECTOR ON J3 SMS. REPLACED THE PRIMARY COMPUTER.'.lower()
dep_matcher = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
{
"RIGHT_ID": "action",
"RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
},
{
"LEFT_ID": "action",
"REL_OP": ">",
"RIGHT_ID": "component",
"RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},
}]
dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
print (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
>>>maint_action repaired connector
>>>maint_action replaced computer
Passing a string, the above works perfectly. but when try passing a DF the new column returns blank.
Heres the function for DF:
import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN
nlp = spacy.load("en_core_web_lg")
data = {'new': ['repaired computer and replaced connector.', 'spliced wire on connector.', 'cycled power and reseated connectors and replaced computer on transmitter.']}
df = pd.DataFrame(data)
dep_matcher = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
{
"RIGHT_ID": "action",
"RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
},
{
"LEFT_ID": "action",
"REL_OP": ">",
"RIGHT_ID": "component",
"RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},
}]
dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)
def find_matches(text):
doc = nlp(text)
rule3_pairs = []
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
A = (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
rule3_pairs.append(A)
return rule3_pairs
df['three_tuples'] = df['new'].apply(find_matches)
I am trying to have each row that meets the pattern output the respective noun and verb combo. Such as:
|three_tuples|
|maint_action repaired computer replaced connector|
|maint_action spliced wire|
|maint_action cycled power reseated connectors replaced computer|
CodePudding user response:
I have executed your code exactly as it is (the second sample) and it's already providing the results that you want (Image below).
You have one small problem in the first code sample, you are not doing:
doc = nlp(text)
But I don't think that's what's causing the issue, maybe try restarting your kernel if you're using jupyter.
Update
After your edit, I noticed that you had a lot of indentation errors please fix those.Also, you are calling the dep_matcher from outside the function not from within, that's why it won't work.
Finally, you are breaking the loop with the return statement there. You should get the return out of the for loop if you want to get all the results.
Here's the code that worked for me:
def find_matches(text):
doc = nlp(text)
dep_matches = dep_matcher(doc)
rule3_pairs = []
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
A = (nlp.vocab[dep_pattern].text, doc[verb] , doc[subject])
rule3_pairs.append(A)
return rule3_pairs
Please take a look at https://stackoverflow.com/help/how-to-ask