I have the following text a="VAT number 12345678901 mobile number 34567890234" I want to find only the number corresponding to a VAT number made up of 11 numbers (ie 12345678901) and I don't want to find 34567890234.
the regex I use is:
rgx = "(?<!\d)\d{11}(?!\d)"
but re.findall(rg, a)
gives me both 34567890234
and 12345678901
.
Any idea?
CodePudding user response:
In the precise string a="VAT number 12345678901 mobile number 34567890234"
, this would look for 11 digits followed by a space and the word mobile
but only return the digits. rgx = "\d{11}(?=\smobile)"
There are a lot of browser driven regular expression creators out there and they are great resource for learning.
Your original expression uses negative look around expressions (?<\d)
and (?!\d)
, they are not supported in all aspects so I tend to avoid them. Additionally, in terms of language structure, detecting the presence of something is generally more precise than the absence of something. Like if someone asks you what you want to drink and you reply "not poison" but you want a soda; you are less likely to get a soda.
So positive look around expressions will be more robust (?=abc)
and (?<abc)
CodePudding user response:
Try this
(?:VAT\s*number\s*)(\d{11})\s
this not capturing block : (?:VAT\s*number\s*)
ensure to search the number after.
this block :
(\d{11})\s
capture the VAT number only if it consists of 11 digits.