Home > Net >  Regex spacy matcher not working as expected in spacy
Regex spacy matcher not working as expected in spacy

Time:07-29

I am learning how to use the matcher in spacy and get this unexpected situation

pnum1 = [{'TEXT':{'REGEX':fr"\d{1,4}"}}]
pnum2 = [{'TEXT':{'REGEX':fr"\d "}}]


import spacy
from spacy.matcher import Matcher
nlp = 
spacy.load("en_core_web_sm")

appli=''' it  has three 56 cows 1087 10b, reg too long number: 12344'''
matcher = Matcher(nlp.vocab)
doc = nlp(appli)

matcher.add("num1",[pnum1])
#matcher.add("num2",[pnum2])
matches = matcher(doc)

reg  =[{'TEXT': {'REGEX':fr"reg"}}]
#matcher.add("reg", [reg])

print(len(matches))
for match_id, start, end in matches:
    matched_span = doc[start:end] 
    print('matched',matched_span.text)

Well the issue is that when using regex pnum1 including {} will not match anything.

When adding pnum2 (uncommenting the corresponding line) it works. Regex look ok in regex101.

The expected result are all the numéricas tokens from 1 to 4 digits Any idea of what is going on?

EDIT: Using pnum2 matcher the collection of matches includes all the tokes being number

Using pnum1 matcher there is no single match.

What I dont understand is why in this context \d{1,4} does not work.

https://regex101.com/r/aTCo84/1

EDIT2: I am learning to use regex, so I do not want to use any other match like ORTH isnumber or alike.

CodePudding user response:

Convert f-string to normal:

pnum1 = [
    {
        'TEXT': {'REGEX':r"\d{1,4}"}
    }
]

In f-strings, {{ and }} must be used as literal braces.

  • Related