Home > Software design >  Correct way to use Lookahead and Lookbehind with RegEx to filter a list
Correct way to use Lookahead and Lookbehind with RegEx to filter a list

Time:12-29

I have a list containing some names from an input field which all meet the conditions in the following Regex:

    ^([A-Za-z']*) $|^([A-Za-z']*[ |-]?[A-Za-z]*) $

BUONAROTTI SIMONI
DALI i DOMENECH
NIETZSCHE
O'COILEAIN AKA COLLINS
O'COILEAIN ALSO COLLINS
O'COILEAIN KNOWN AS COLLINS
O'COILEAIN ALSO KNOWN AS COLLINS
PAYNE-GAPOSCHKIN
TOULOUSE-LAUTREC-MONFA
VAN EYCK

However, I need to identify the names in the list containing the following unwanted phrases within the string (Case-insensitive):

AKA, was, also, Also known as, Known as, Previously

I have tried the following Regexes without success, I just can't seem to get my head around Lookarounds in Regex


^(?>!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )[A-Za-z']*) $|^(?>!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )[A-Za-z']*[ |-]?[A-Za-z]*) $

^[A-Za-z']*) (?>!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )$|^[A-Za-z']*[ |-]?[A-Za-z]*) (?>!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )$

^(?<!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )[A-Za-z']*) $|^(?<!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )[A-Za-z']*[ |-]?[A-Za-z]*) $

^[A-Za-z']*)(?<!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO ) $|^[A-Za-z']*[ |-]?[A-Za-z]*) (?<!( WAS )|( AKA )|( KNOWN AS )|( PREVIOUSLY )|( ALSO )$

Please can somebody point me in the right direction with Lookarounds (either Lookahead or Lookbehind). Thanks in advance for any/all advice.

CodePudding user response:

In your pattern ^([A-Za-z']*) $|^([A-Za-z']*[ |-]?[A-Za-z]*) $ all the parts are optional so it could also match an empty string.

You can omit one of the alternations by optionally repeating the second part and matching at least 1 or more characters with the character class using

^[A-Za-z'] (?:[ -][A-Za-z] )*$

If you want to identify the unwanted phrases, you don't need lookarounds. You can match one of the unwanted alternatives between optional parts allowing only your specified characters in the character classes.

You can omit the | from the character class if you don't meant to match a pipe char [ -]

^(?:[A-Za-z'] (?:[ -][A-Za-z] )* )?(?:AKA|was|Also(?: known as)?|Known as|Previously)(?: [A-Za-z'] (?:[ -][A-Za-z] )*)?$

Regex demo

The other way around could be to exclude the match using a negative lookahead, only allowing to match if one of the alternatives is not present.

^(?!.*(?:AKA|was|Also(?: known as)?|Known as|Previously))[A-Za-z'] (?:[ -][A-Za-z] )*$

Regex demo

  • Related