I'm trying to write a regex to capture any measurement unit in a string, considering that the unit can be before or after the number.
What I came up for the moment are two regex.
/\d*\.?,?\d \s?(kg|g|l)/gi
that matches with
ABC 200g
EFG 5,4 Kg
HIL 2x20l
And (kg|g|l)\s?\d ,?\.?d*
that matches with:
ABC g200
EFG kg 5,4
HIL l 20x2
How can I join the two regex to match with both:
ABC g200
EFG 5,4 Kg
CodePudding user response:
With a case insensitive pattern, match an optional k
and g
or l
and an alternation |
to match the pattern the other way as well.
The optional dot and comma can be in a character class [.,]?
or else .?,?
can also match both like .,
The word boundaries \b
prevent a partial match after the unit.
\d*[.,]?\d \s*(?:k?g|l)\b|\b(?:k?g|l)\s*\d*[.,]?\d
CodePudding user response:
With your shown samples, please try following regex.
(?:(?:(?:\d )g|(?:g\d ))|(?:(?:l\s*\d )|(?:\d \s*l))|(?:(?:\d ,\d \s*Kg)|(?:kg\s*\d ,\d )))
Explanation: Adding detailed explanation for above.
(?: ##Starting 1st capturing group from here.
(?: ##Starting 2nd capturing group from here.
(?:\d )g|(?:g\d ) ##Matching either digits followed by g OR g followed by digits(both conditions in non-capturing groups here).
) ##Closing 2nd capturing group here.
| ##Putting OR condition here.
(?: ##Starting 3rd capturing group here.
(?:l\s*\d )|(?:\d \s*l) ##Matching eiter l followed by 0 or more spaces followed by digits OR digits followed by 0 or more spaces followed by l.
) ##Closing 3rd capturing group here.
| ##Putting OR condition here.
(?: ##Starting 4th capturing group here.
(?:\d ,\d \s*Kg)|(?:kg\s*\d ,\d ) ##Checking either digits followed by comma digits spaces Kg OR kg spaces digits comma digits here.
) ##Closing 4th capturing group here.
) ##Closing 1st capturing group here.