Home > database >  How do you match these groups with Python regex?
How do you match these groups with Python regex?

Time:07-13

I have a weird case where this simple code is not functioning as expected:

import re

text = 'This Level: 75.3'
matches = re.search(r'(?:(?:\d{1,3},)(?:\d{3},)*(?:\d{3})|\d*)(?:\.\d )?', text)

print(matches.match)

I keep getting a blank string returned... however, I would expect this to be 75.3. This works for other use cases, such as:

assert util.strip_str_to_float('7') == 7.0
assert util.strip_str_to_float('75') == 75.0
assert util.strip_str_to_float('75.5') == 75.5
assert util.strip_str_to_float('7.7.9') == 7.7
assert util.strip_str_to_float('1,298.3 Gold') == 1298.3

Ultimately, I'm trying to pull out and convert the first float from a given string... I wasn't expecting this test case to be a failure. It seems to be failing specifically when the matching does not start at the beginning of the string. The search seems to work fine if I remove the non-capturing groups, for example, this works:

matches = re.search(r'\d*\.\d ', text)

But this does not:

matches = re.search(r'\d*(?:\.\d )?', text)

Any ideas...?

CodePudding user response:

It looks like you're allowing plain integers without the decimal part as well as decimals like ".5" without the whole number part. That's great, but since both parts are optional, you're also matching when neither part is present, so you're getting a lot of empty 0-length matches.

This is also why your pattern r'\d*\.\d ' worked, because the decimal was required.

pattern = r'\d{1,3}(?:,\d{3})*(?:\.\d )?|\.\d '

If I'm understanding the question right, this pattern should work. It's divided into two parts, so it looks for either:

  • a whole number with a decimal part optional, or
  • a required decimal part, with no whole number before it
  • Related