I was thinking about parsing a list of integers (from a property string). However, I would like to go beyond just positive and negative decimal values and parse any string that indicates a Java integer literal (JLS 17) as can be found in source code. Similarly, I would like to be lenient with regards to any prefixes, separators and appendices around the integers themselves. In other words, I want to find them using repeated calls to Matcher.find()
.
Is there a regular expression that matches all possible Java integer literals? It doesn't need to check the upper and lower bounds.
Even though I did explicitly link to the JLS, I'll show some valid and invalid numbers:
-1
: the1
is matched, but the minus is an unary operator (I'll adjust if necessary)0x00_00_00_0F
: the value fifteen is matched as hex digits, with an underscore to separate the two nibbles0b0000_1111
: the value fifteen in binary is matched017
: the octal value of fifteen is matched
CodePudding user response:
Something like that:
decimal:
(?:0|[1-9](?:_*[0-9])*)[lL]?
hexadecimal:
0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?
octal:
0[0-7](?:_*[0-7])*[lL]?
binary:
0[bB][01](?:_*[01])*[lL]?
All together: (in freespacing mode)
(?:
0
(?:
x [a-fA-F0-9] (?: _* [a-fA-F0-9] )*
|
[0-7] (?: _* [0-7] )*
|
[bB] [01] (?: _* [01] )*
)?
|
[1-9] (?: _* [0-9] )*
)
[lL]?
CodePudding user response:
Well.... in simplest terms, base 2, 8, and 10 number could use the same pattern since their values are all numeric characters. BUT, you probably want an expression for each type. The problem is that you did not make clear your intent. I am going on the assumption that you want the expression to validate what base a particular value is.
String base10Regex = "[0-9] ";
String base2Regex = "[0-1] ";
String base8Regex = "[0-7] ";
String base16Regex = "^[0-9A-F] $";
For the octal and decimal values, you will need to prepend your expression to check for an optional sign character "^[\\ |-]?"
. For hex values, if you expect the values to start with "0x", I will suggest to prepend the expression with those literal values.