I'm using an incident database to identify the causes of accidents. I have defined a pattern and a function to extract the matching patterns. However, sometimes this function creates overlapping results. I saw in a previous post that we can use for span in spacy.util.filter_spans(spans):
to avoid repetition of answers. But I don't know how to rewrite the function with this. I will be grateful for any help you can provide.
pattern111 = [{'DEP':'compound','OP':'?'},{'DEP':'nsubj'}]
def get_relation111(x):
doc = nlp(x)
matcher = Matcher(nlp.vocab)
relation= []
matcher.add("matching_111", [pattern111], on_match=None)
matches = matcher(doc)
for match_id, start, end in matches:
matched_span = doc[start: end]
relation.append(matched_span.text)
return relation
CodePudding user response:
filter_spans
can be used on any list of spans. This is a little weird because you want a list of strings, but you can work around it by saving a list of spans first and only converting to strings after you've filtered.
def get_relation111(x):
doc = nlp(x)
matcher = Matcher(nlp.vocab)
relation= []
matcher.add("matching_111", [pattern111], on_match=None)
matches = matcher(doc)
for match_id, start, end in matches:
matched_span = doc[start: end]
relation.append(matched_span)
# XXX Just add this line
relation = [ss.text for ss in filter_spans(relation)]
return relation