Home > Blockchain >  Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them
Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them

Time:05-18

Given the following string...

ABC DEF GHI: 319 022 6543 QRS : 531 450

I'm trying to extract all ranges that start/end with a digit, and which may contain whitespace, but I want that whitespace itself removed.

For instance, the above should yield two results (since there are two 'ranges' that match what I aim looking for)...

3190226543
531450

My first thought was this, but this matches the spaces between the letters...

([\d\s])

Then I tried this, but it didn't seem to have any effect...

([\d \s*])

This one comes close, but its grabbing the trailing spaces too. Also, this grabs the whitespace, but doesn't remove it.

(\d[\d\s] )

If it's impossible to remove the spaces in a single statement, I can always post-process the groups if I can properly extract them. That most recent statement comes close, but how do I say it doesn't end with whitespace, but only a digit?

So what's the missing expression? Also, since sometimes people just post an answer, it would be helpful to explain out the RegEx too to help others figure out how to do this. I for one would love not just the solution, but an explanation. :)

Note: I know there can be some variations between RegEx on different platforms so that's fine if those differences are left up to the reader. I'm more interested in understanding the basic mechanics of the regex itself more so than the syntax. That said, if it helps, I'm using both Swift and C#.

CodePudding user response:

You cannot get rid of whitespace from inside the match value within a single match operation. You will need to remove spaces as a post-processing step.

To match a string that starts with a digit and then optionally contains any amount of digits or whitespaces and then a digit you can use

\d(?:[\d\s]*\d)?

Details:

  • \d - a digit
  • (?:[\d\s]*\d)? - an optional non-capturing group matching
    • [\d\s]* - zero or more whitespaces / digits
    • \d - a digit.

See the regex demo.

  • Related