Home > Net >  RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that
RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that

Time:10-26

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.

Examples (N=5, starting at the beginning):

12345ABC
12345123
1234-1
1234--1
1----1AB

How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-] (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.

Update Strings that should not be matched (N=5)

1-2-3-A
----1AB
--1--1A

CodePudding user response:

You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.

^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-] $
  • ^ Start of string
  • (?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
  • (?=[\d-]{5}) Assert at least 5 digits or -
  • [A-Z\d-] Match 1 times any of the listed characters
  • $ End of string

Regex demo

If atomic groups are available:

^(?=[\d-]{5})(?>\d -*|-{5})[A-Z\d_]*$
  • ^ Start of string
  • (?=[\d-]{5}) Assert at least 5 chars - or digit
  • (?> Atomic group
    • \d -* Match 1 digits and optional -
    • | or
    • -{5} match 5 times -
  • ) Close atomic group
  • [A-Z\d_]* Match optional chars A-Z digit or _
  • $ End of string

Regex demo

CodePudding user response:

Use a non-word-boundary assertion \B:

^[-\d](?:-|\B\d){4}[A-Z\d-]*$

A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)

With it, each \B\d always follows a digit. (and can't follow a dash)

demo


Other way (if lookbehinds are allowed):

^\d*-*(?<=^.{5})[A-Z\d-]*$

demo

  • Related