Home > Software design >  Refine decimal and hex number capturing
Refine decimal and hex number capturing

Time:09-29

I'm working in a regex formula to capture decimal and hex numbers. I'm satisfied with it but I would like to tune it a bit.

The formula itself:

\b([[:xdigit:]]{2}(?:\s)?) \b|\b(-|\.)?[0-9] (x?([[:xdigit:]] ))?

...and the test dummies:

raw data=58 4b 20 00    :-1
Machine\Head.cpp:298
123.45
0xABCDEF123456
Ab2537ff
Test aa
Testaa
Test1
Test 1
-25
.375

It works perfect for me except one thing I want to correct but I can't find how. The last two examples (-25 and .375) are captured but just the numbers and I would like to capture also the - and the . because, just in this case, they form part of the number itself.

Could anyone point me in the right direction? Tested with look behind and look ahead options without success.

Thank you all!

Ben

CodePudding user response:

You can modify the pattern a bit:

\b(?:[[:xdigit:]]{2}\s?) \b|[-.]?\b[0-9] (?:x?[[:xdigit:]] )?

See the regex demo. Note that (-|\.)? is changed into [-.]? and the word boundary is moved after this optional pattern.

Note that this pattern is not quite efficient, you should avoid constructs like (ab?) inside regular expressions. A better regex will look like

\b[[:xdigit:]]{2}(?:\s [[:xdigit:]]{2})*\b|[-.]?\b[0-9] (?:x?[[:xdigit:]] )?

where (?:[[:xdigit:]]{2}\s?) is replaced with [[:xdigit:]]{2}(?:\s [[:xdigit:]]{2})* that matches hex char pairs in a more efficient way.

  • Related