Home > Blockchain >  Regex help, match numbers, stop at whitespace
Regex help, match numbers, stop at whitespace

Time:09-30

Given the following sample text:

Phase ID: 4 333 stuff
Phase ID: 4.5 333 stuff
Phase ID: 44.55 333 stuff

I'd like to capture just the number after the "Phase ID:". Here is what I currently have:

Phase ID:\K.*([0-9]*\.[0-9]|[0-9][^\s] )

It works, except it also captures the 333. How can I stop the match after the first number?

CodePudding user response:

Match non-spaces after Phase ID:

(?<=Phase ID: )\S 

See live demo.


If you need to be more prescriptive with the match being a number:

(?<=Phase ID: )\d (?:\.\d )?

See live demo.

CodePudding user response:

import re
string2 = "Phase ID: 44.55 333 stuff"
captured = float(re.findall(r"Phase ID: (\d*\.*\d*).*", string2)[0])

CodePudding user response:

This statement is messing up what you want

[^\s]

This will match any non-whitespace characters an unlimited number of times, hence why you are seeing those numbers after the space get included. If you change to something like:

Phase ID: ([0-9.]*).*

It will match the entire line, but the digits following the string "Phase ID:" would be captured in the first "capture group" (denoted by parenthesis) which you can extract. The whitespace will not be included in the first capture group because it isnt present in [0-9.]

I like to test regex on interactive sites like https://regex101.com/

CodePudding user response:

I think I would tried

import re 
regex_string = "\d \.\d |\d "
pattern = re.compile(regex_string)
print(pattern.search("Phase ID: 4 333 stuff"))
print(pattern.search("4.5 333 stuff"))
print(pattern.search("Phase ID: 4455 333 stuff"))

CodePudding user response:

It works, except it also captures the 333. How can I stop the match after the first number?

The reason you are also matching 333 is because .* is greedy and will match until the end of the line first.

Then it will backtrack until it can match one of the alternatives, and in this case [0-9][^\s] can match 33 and the pattern can return a match.

As you are already using \K you can write the pattern as:

Phase ID:\s*\K\d (?:\.\d )?

Explanation

  • Phase ID: Match literally
  • \s*\K Match optional whitespace chars and then forget what is matched until now
  • \d Match 1 digits
  • (?:\.\d )? Match an optional decimal part

See a regex demo

  • Related