Home > Mobile >  Python Regex solution for finding numbers not preceded by a set of characters
Python Regex solution for finding numbers not preceded by a set of characters

Time:01-10

I'm looking for a regex solution to find all numbers (arbitrary # of digits) that are not preceded by "p=" or "p = ". For example:

5.5 vs 9.9, p = 0.01 and p = 0.05

The regex should pick up 5.5 and 9.9, but not 0.01 or 0.05

I've tried using a negative lookbehind (?<!p=|p\s=\s)(\d?\.?\d) but it still returns 01 and 05.

Any help is appreciated. Thanks!

CodePudding user response:

Python doesn't allow variable-width negative lookbehinds, but you can put more than one, and add one to guard against matching in the middle of a number.

>>> re.findall(r'(?<!p=)(?<!p\s=\s)(?<![\d.])(\d?\.?\d )', inp)
['5.5', '9.9']

CodePudding user response:

We can use re.findall followed by a list comprehension to filter the results:

inp = "5.5 vs 9.9, p = 0.01 and p = 0.05"
matches = re.findall(r'(?:p\s*=\s*)?\d (?:\.\d )?', inp)
result = [m for m in matches if re.search(r'^\d (?:\.\d )?$', m)]
print(result)  # ['5.5', '9.9']

The trick here is to match all floats/integers as well as p = <some num> values, the latter first. We then retain only the numbers not preceded by p =.

CodePudding user response:

You could match what you don't want, and an alternation with a capture group for what you want to keep. Using use re.findall will return the capture group 1 values.

\bp\s*=\s*\d (?:\.\d )?|(\d (?:\.\d )?)

Explanation

  • \bp\s*=\s*\d (?:\.\d )? Match p= with optional whitespace chars between the equals sign, and match 1 digits with an optional decimal part
  • | Or
  • (\d (?:\.\d )?) Capture group 1, match 1 digits with can optional decimal part

See a regex101 demo.

s = r"5.5 vs 9.9, p = 0.01 and p = 0.05"
pattern = r"\bp\s*=\s*\d (?:\.\d )?|(\d (?:\.\d )?)"
print([v for v in re.findall(pattern, s) if v])

Output

['5.5', '9.9']
  • Related