Please help me!
I am parsing strings which contain weights. But here is the catch: some strings contain range (see line 3 of that example below), which I consider an ambiguous value and do not want to match at all.
examples are:
1.0kg - should return group(1)='1.0', group(2)='kg'
400.00g - should return group(1)='400.00', group(2)='g'
100-800g - right now returns group(1)='800', group(2)='g', but should not return match!
Regex I am using right now is:
r"([\d.,] )(g|kg)"
How to modify it to exclude 3rd line from returning a match?
Right now I check if string contains '-' before using a regex, but I wonder how to do it using a regex patter without extra if-else statements.
CodePudding user response:
You may use the following regex pattern:
(?<!-)\b\d (?:\.\d )?\wg
This pattern excludes numbers which are immediately preceded by a dash, while still also requiring that the matching number is bounded on the left by a word boundary.
Explanation:
(?<!-)
assert that hyphen does not preceded (eliminate100-800g
)\b
but still match a word boundary\d
match an integer(?:\.\d )?
optional decimal component\w
single letter unit in front of gramsg
match 'g' for grams
Here is a working demo.