Home > Enterprise >  Python Regex - Match All Numbers with Units [closed]
Python Regex - Match All Numbers with Units [closed]

Time:10-05

I'm trying to use regex to match numbers that are followed by units, including spaces since sometimes the text isn't clean. For example, if I have some text like this:

blah blah 5/8" blah blah 60lbs blah blah 1 /8" blah blah 40 lbs 6oz

I would want to match:

5/8"
60lbs
1 /8"
40 lbs
6oz

I was thinking of having a variable where I can set the unit (run a loop through a list of units) and adding that to the regex expression so that the expression basically matches some numbers unit but I'm having trouble coming up with the regex expression for matching everything before the unit.

Would appreciate any help! Thank you!

Note: I can also alter the text if that's easier. I thought maybe removing all spaces could be helpful but that might also complicate things more.

CodePudding user response:

I have a Python solution that works

import re

text = """blah blah 5/8" blah blah 60lbs 
            blah blah 580/18" blah blah 60lbs 
            blah blah 1 /8" blah blah 40 lbs 6oz, 5Kg"""

units = ['"', 'lbs', 'oz', 'kg'] # add lower cacse units of measure
digit_regex = ['(\d*?\/\d*?', '(\d{2,}?'] # [digits with "/" between, 2 or more digits]
results = []

for measure_unit in units:
    for digit in digit_regex:
        pattern = f'{digit}{measure_unit})'
        for match in re.findall(pattern, text.replace(' ', '').lower()):
            if match is not None and match !='':
                results.append(match)

print(results)

The output will be:

['5/8"', '580/18"', '1/8"', '18"', '60lbs', '60lbs', '40lbs']
[Finished in 21ms]

It need some further thinking because there are situations where a match like '18"' shouldn't be present since it's a part of '580/18"' but I got you going

CodePudding user response:

In unit tag, add all units:

(?:(?P<num>(?:(?:([\ -]\s*)?[1-9]\d*)|0)(?:\s*\/\d*)?)(?:\s*(?P<unit>\"|lbs|oz))?)

python

test='blah blah 5/8" blah blah 60lbs blah blah 1 /8" blah blah 40 lbs 6oz'
units=['"','lbs','oz'] # define list of accepted units
pattern=fr'(?:(?P<num>(?:(?:([\ -]\s*)?[1-9]\d*)|0)(?:\s*\/\d*)?)(?:\s*(?P<unit>{"|".join(units)}))?)'
r = re.compile(pattern)
res=[m.groupdict() for m in r.finditer(test)]

Generated pattern:

 '(?:(?P<num>(?:(?:([\\ -]\\s*)?[1-9]\\d*)|0)(?:\\s*\\/\\d*)?)(?:\\s*(?P<unit>"|lbs|oz))?)'

res:

[{'num': '5/8', 'unit': '"'},
 {'num': '60', 'unit': 'lbs'},
 {'num': '1 /8', 'unit': '"'},
 {'num': '40', 'unit': 'lbs'},
 {'num': '6', 'unit': 'oz'}]

regex101 Result

  • Related