Home > Mobile >  Abort regex execution when pattern found in negative lookahead syntax
Abort regex execution when pattern found in negative lookahead syntax

Time:03-11

While struggling trying to validate SQL Server's connection string pattern using regex I've achieved the following result:

^(?!.*?(?<=^|\;)[a-zA-Z] ( [a-zA-Z] )*(\=[^\;] ?\=[^\;]*)?(\;|$)) ([a-zA-Z] ( [a-zA-Z] )*\=[^\;] \;?) $

Sample string used was:

option=value;missingvalue;multiple assignment=123=456

* (hosted and tested in regex101)

And, as expected, the string didn't match. The issue is that I think this may not be standard, recommended nor optimal regex implementation — especially at the negative lookahead part, considering it's just going through the whole string even after a successful match.

I'll try to break down how it works below:


Negative Lookahead

1. ^(?!.*?(?<=^|;)

Negative lookahead pattern starting either at the beginning of the string or recursively throughout just after the semi colon character

2. [a-zA-Z] ( [a-zA-Z] )*(=[^;] ?=[^;]*)?(;|$))

Matching the simple or composite option names — that is, just [a-zA-Z] (mandatory) or, additionally, ( [a-zA-Z] )* any number of times; afterwards there's an optional group that tries to match when there's more than one consecutive value assignment for any given option; finally it ends with either ; or $ (end of string) — in case of the first one, the lookahead pattern restarts from the beginning (recursion)

Regular Pattern Matching

([a-zA-Z] ( [a-zA-Z] )*=[^;] ;?) $

Not much new to say here other than that this is the pattern which should actually match the string after the initial Negative Lookahead thorough scan/validation.


I can't deny that it's kinda working for what I intended, but I can't hold back the feeling that I'm misunderstanding something about regex's workings.

Is there an easier way to do this while avoiding having to recursively look ahead using the pattern described above multiple times?

EDIT: As requested, some closer to real life examples would be the following — for both valid and invalid formatting:

  • VALID
Database=somedb;Username=admin;Password=P@ssword!23;Port=1433
  • INVALID
  1. missing delimiter between Username and Password options
Database=somedb;Username=adminPassword=P@ssword!23;Port=1433
  1. missing value for Port option
Database=somedb;Port;Username=admin;Password=P@ssword!23

CodePudding user response:

The following string accepts only letters for the names. for the purposes of testing it accepts any character except equals and semi colon in the values. This would need to be defined as characters like line ending and tab would need to be excluded. We have a negative lookahead to forbid a second equals sign in the values and a negative lookback to forbid a semi-colon before the end. Please note that your "correct" example is found to be wrong because there is no semi-colon at the end If we try to block the otherway round it becomes impossible to match the regex.
I've added an optional single space in the name to match "Connection Timeout" and similar

/^(\s*[a-zA-Z]  ?[a-zA-Z] =[^=;] ;) $/gm

I have also allowed spaces before the name.
Our string is made up of ^beginning of line ( start group
\s* optional whitespace before name
[a-zA-Z] ?[a-zA-Z] name containing at least one letter before and after an optional space. This means at least two letters
=an equals sign
(start inner group
(?!\=) negative look ahead for equals sign
[^=;] any character except equals and semi-colon at least once
; a literal semi-colon.
){4,}close the outer group and repeat it at least 4 times
$ end of line

Thank you Casimir et Hippolyte for the improvement. I was using look-aheads and look-backs following the question but your syntax is much cleaner.

  • Related