Home > database >  Regular expression to find words starting with no number
Regular expression to find words starting with no number

Time:12-30

I need to match a string with an identifier.

Pattern

Any word will be considered as identifier if

  1. Word doesn't contain any character rather than alpha-numeric characters.
  2. Word doesn't start with number.

Input

The given input string will not contain any preceding or trailing spaces or white-space characters.

Code

I tried using the following regular expressions

  1. \D[a-zA-Z]\w*\D
  2. [ \t\n][a-zA-Z]\w*[ \t\n]
  3. ^\D[a-zA-Z]\w*$

None of them works.

How can I achieve this?

CodePudding user response:

Note that in your ^\D[a-zA-Z]\w*$ regex, \D can match non-alphanumeric chars since \D matches any non-digit chars, and \w also matches underscores, which is not an alphanumeric char.

I suggest

^(?![0-9])[A-Za-z0-9]*$

It matches

  • ^ - start of string
  • (?![0-9]) - no digit allowed immediately to the right of the current location
  • [A-Za-z0-9]* - zero or more ASCII letters/digits
  • $ - end of string.

See the regex demo.

CodePudding user response:

A \D matches any non-digit characters including not only alphabets but also punctuation characters, whitespace characters etc. and you definitely do not need them in the beginning.

You can use ^[A-Za-z][A-Za-z0-9]*$ which can be described as

  • ^: Start of string
  • [A-Za-z]: An alphabet
  • [A-Za-z0-9]*: An alphanumeric character, zero or more times
  • $: End of string

Demo

CodePudding user response:

An even simpler pattern for identifier - not using negative lookahead like Wiktor's answer:

^[^0-9][A-Za-z0-9]*$ decomposed and explained:

  1. ^[^0-9]: Word starts ^ not [^ with a number 0-9] (more exactly, first char is not a digit, but second character can be a digit!).
  2. [A-Za-z0-9]*: Word doesn't contain any character rather than alpha-numeric characters (not even hyphen or underscore) until the end $.

See demo on regex101.

Positive alternative

As already suggested by Arvind Kumar Avinash: If (according to both rules) the first char must not be a digit or numeric, but only an alpha, then we could also exchange the first part from above regex from "not-numeric" to "only-alpha".

[A-Za-z][A-Za-z0-9]* explained:

  1. [A-Za-z]: first char must be an alpha
  2. [A-Za-z0-9]*: optional second and following chars can be any alpha-numeric

Same effect, see demo on regex101.

Tests

input result reason
aB123 matches identifier
Ab123 matches identifier
XXXX12YZ matches identifier
a2b3 matches identifier
a matches identifier
Z matches identifier
0 no match starts with a digit
1Ab no match starts with a digit
12abc no match starts with a digit
abc_123 no match contains underscore, not alphanum
r2-d2 no match contains hyphen, not alphanum
  • Related