I am learning how to use the matcher in spacy and get this unexpected situation
pnum1 = [{'TEXT':{'REGEX':fr"\d{1,4}"}}]
pnum2 = [{'TEXT':{'REGEX':fr"\d "}}]
import spacy
from spacy.matcher import Matcher
nlp =
spacy.load("en_core_web_sm")
appli=''' it has three 56 cows 1087 10b, reg too long number: 12344'''
matcher = Matcher(nlp.vocab)
doc = nlp(appli)
matcher.add("num1",[pnum1])
#matcher.add("num2",[pnum2])
matches = matcher(doc)
reg =[{'TEXT': {'REGEX':fr"reg"}}]
#matcher.add("reg", [reg])
print(len(matches))
for match_id, start, end in matches:
matched_span = doc[start:end]
print('matched',matched_span.text)
Well the issue is that when using regex pnum1 including {} will not match anything.
When adding pnum2 (uncommenting the corresponding line) it works. Regex look ok in regex101.
The expected result are all the numéricas tokens from 1 to 4 digits Any idea of what is going on?
EDIT: Using pnum2 matcher the collection of matches includes all the tokes being number
Using pnum1 matcher there is no single match.
What I dont understand is why in this context \d{1,4} does not work.
https://regex101.com/r/aTCo84/1
EDIT2: I am learning to use regex, so I do not want to use any other match like ORTH isnumber or alike.
CodePudding user response:
Convert f-string to normal:
pnum1 = [
{
'TEXT': {'REGEX':r"\d{1,4}"}
}
]
In f-strings, {{
and }}
must be used as literal braces.