The regular expression (regex
) shown below is an example of something that would match a variable-name or function-name in an old language such as C .
[a-zA-Z_] [a-zA-Z_0-9]*
Partially translated, we have:
[a-zA-Z_]
means one or more of[a-zA-Z_]
[a-zA-Z_0-9]*
means zero or more of[a-zA-Z_0-9]
When the regular expression shown above is translated into mathematical English, we have:
One or more letters and underscores on the left, followed by zero or more symbols taken from of the set of all letters
A
throughZ
, underscores, and numbers zero through nine.
My question is, What regex would match names for variables, allowing numbers on the left, but not matching literal numbers, such as 6.4222
It would be ideal if the regex would not match any of the following integer literals:
INTEGER | REMARK |
---|---|
42 | The answer to the ultimate question of life, the universe and everything |
052 | Octal (base 8) |
0x2a | Hexidecimal (base 16) with a lower-case letter a |
0X2A | Hexidecimal (base 16) with an upper-case letter A |
0b101010 | Binary |
The regular expression (regex) should match all of the following strings:
0orange
1kiwi
8apple
main
get_user_input
_
_callable
CodePudding user response:
The following regular expression will match both of the following:
- variable names in the C programming language
- variable names in C with a string of one or more numerals (0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9) inserted at the left of the variable name.
\b[0-9]*(?:(?!x)[a-zA-Z_]) [a-zA-Z_0-9]*\b
String | is_match |
---|---|
052 | false |
789321 | false |
25.122 | false |
0x2a | false |
INTEGER | true |
REMARK | true |
0X2A | true |
0b101010 | true |
0orange | true |
8apple | true |
0orange | true |
1kiwi | true |
8apple | true |
main | true |
get_user_input | true |
_ | true |
_callable | true |
The regular expression does not match integer (int
) or floating-point (float
) constants.
You can try \b[0-9]*(?:(?!x)[a-zA-Z_]) [a-zA-Z_0-9]*\b
at the following website regex101.com
CodePudding user response:
[a-zA-Z_0-9]*[a-zA-Z_][a-zA-Z_0-9]*
[a-zA-Z_]
in the middle requires at least one non-digit character somewhere in the token.