Home > Mobile >  Regular expression to match integer literal
Regular expression to match integer literal

Time:11-24

I was thinking about parsing a list of integers (from a property string). However, I would like to go beyond just positive and negative decimal values and parse any string that indicates a Java integer literal (JLS 17) as can be found in source code. Similarly, I would like to be lenient with regards to any prefixes, separators and appendices around the integers themselves. In other words, I want to find them using repeated calls to Matcher.find().

Is there a regular expression that matches all possible Java integer literals? It doesn't need to check the upper and lower bounds.


Even though I did explicitly link to the JLS, I'll show some valid and invalid numbers:

  • -1: the 1 is matched, but the minus is an unary operator (I'll adjust if necessary)
  • 0x00_00_00_0F: the value fifteen is matched as hex digits, with an underscore to separate the two nibbles
  • 0b0000_1111: the value fifteen in binary is matched
  • 017: the octal value of fifteen is matched

CodePudding user response:

Something like that:

decimal:
(?:0|[1-9](?:_*[0-9])*)[lL]?

hexadecimal:
0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?

octal:
0[0-7](?:_*[0-7])*[lL]?

binary:
0[bB][01](?:_*[01])*[lL]?

All together: (in freespacing mode)

(?:
    0
    (?:
        x [a-fA-F0-9] (?: _* [a-fA-F0-9] )*
      |
        [0-7] (?: _* [0-7] )*
      |
        [bB] [01] (?: _* [01] )*
    )?
  |
    [1-9] (?: _* [0-9] )*
)
[lL]?

test it

CodePudding user response:

Well.... in simplest terms, base 2, 8, and 10 number could use the same pattern since their values are all numeric characters. BUT, you probably want an expression for each type. The problem is that you did not make clear your intent. I am going on the assumption that you want the expression to validate what base a particular value is.

String base10Regex = "[0-9] ";
String base2Regex = "[0-1] ";
String base8Regex = "[0-7] ";
String base16Regex = "^[0-9A-F] $";

For the octal and decimal values, you will need to prepend your expression to check for an optional sign character "^[\\ |-]?". For hex values, if you expect the values to start with "0x", I will suggest to prepend the expression with those literal values.

  • Related