Let's assume we have the following string:
This thing costs $5000.
I'm trying to match up $5000 with negative lookbehind:
(?<!([:;]))\$?([0-9] )
So that it doesn't find a match if it has ";" or ':' behind $5000, eg. ;$5000 or ;5000.
First string:
This thing costs $5000
or
5000
Desired output:
$5000
or
5000
Second string:
This thing costs ;$5000
or
;5000
Desired output:
None
CodePudding user response:
Your implementation is great but there's a single flaw: you can match from the middle of the digits or from the $
sign.
Add \d
and $
to the negative lookback and it'll work:
(?<![;:\d$])\$?([0-9] )
Examples:
>>> re.findall("(?<![;:\d$])\$?([0-9] )", "This thing costs ;$5000")
[]
>>> re.findall("(?<![;:\d$])\$?([0-9] )", "This thing costs $5000")
['5000']
Keep in mind I do suggest matching the number instead of dealing with negative lookbacks like so:
re.findall(r"\s\$?([0-9] )", "This thing costs ;$5000")
CodePudding user response:
Sometimes, matching what you do want rather than what you don't want is easier. As far as I can tell, what you're really looking for is an integer that optionally start with $
. But, you still want to capture the $
if it's there. The ;
and :
are just red-herrings.
import re
values = ['This thing costs $5000.', # $5000
'This thing costs ;$5000.', # None
'This thing costs 5000.', # 5000
'This thing costs ;5000.', # None
'This thing costs8000'] # None
pattern = r'.*\s(\$?\d )'
for value in values:
# If a match is made, we want group 1 from that match.
if match := re.match(pattern, value):
print(match.group(1))
else:
print(match)
Output:
$5000
None
5000
None
None
See https://regexr.com/6sqbs for an in-depth explanation of my pattern .*\s(\$?\d )