I need to match a string with an identifier.
Pattern
Any word will be considered as identifier if
- Word doesn't contain any character rather than alpha-numeric characters.
- Word doesn't start with number.
Input
The given input string will not contain any preceding or trailing spaces or white-space characters.
Code
I tried using the following regular expressions
\D[a-zA-Z]\w*\D
[ \t\n][a-zA-Z]\w*[ \t\n]
^\D[a-zA-Z]\w*$
None of them works.
How can I achieve this?
CodePudding user response:
Note that in your ^\D[a-zA-Z]\w*$
regex, \D
can match non-alphanumeric chars since \D
matches any non-digit chars, and \w
also matches underscores, which is not an alphanumeric char.
I suggest
^(?![0-9])[A-Za-z0-9]*$
It matches
^
- start of string(?![0-9])
- no digit allowed immediately to the right of the current location[A-Za-z0-9]*
- zero or more ASCII letters/digits$
- end of string.
See the regex demo.
CodePudding user response:
A \D
matches any non-digit characters including not only alphabets but also punctuation characters, whitespace characters etc. and you definitely do not need them in the beginning.
You can use ^[A-Za-z][A-Za-z0-9]*$
which can be described as
^
: Start of string[A-Za-z]
: An alphabet[A-Za-z0-9]*
: An alphanumeric character, zero or more times$
: End of string
CodePudding user response:
An even simpler pattern for identifier - not using negative lookahead like Wiktor's answer:
^[^0-9][A-Za-z0-9]*$
decomposed and explained:
^[^0-9]
: Word starts^
not[^
with a number0-9]
(more exactly, first char is not a digit, but second character can be a digit!).[A-Za-z0-9]*
: Word doesn't contain any character rather than alpha-numeric characters (not even hyphen or underscore) until the end$
.
See demo on regex101.
Positive alternative
As already suggested by Arvind Kumar Avinash: If (according to both rules) the first char must not be a digit or numeric, but only an alpha, then we could also exchange the first part from above regex from "not-numeric" to "only-alpha".
[A-Za-z][A-Za-z0-9]*
explained:
[A-Za-z]
: first char must be an alpha[A-Za-z0-9]*
: optional second and following chars can be any alpha-numeric
Same effect, see demo on regex101.
Tests
input | result | reason |
---|---|---|
aB123 | matches identifier | |
Ab123 | matches identifier | |
XXXX12YZ | matches identifier | |
a2b3 | matches identifier | |
a | matches identifier | |
Z | matches identifier | |
0 | no match | starts with a digit |
1Ab | no match | starts with a digit |
12abc | no match | starts with a digit |
abc_123 | no match | contains underscore, not alphanum |
r2-d2 | no match | contains hyphen, not alphanum |