Home > front end >  Regex: Match pattern unless preceded by pattern containing element from the matching character class
Regex: Match pattern unless preceded by pattern containing element from the matching character class

Time:09-07

I am having a hard time coming up with a regex to match a specific case:

This can be matched: any-dashed-strings this-can-be-matched-even-though-its-big

This cannot be matched: strings starting with elem- or asdf- or a single - elem-this-cannot-be-matched asdf-this-cannot-be-matched -

So far what I came up with is:

/\b(?!elem-|asdf-)([\w\-] )\b/

But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found

I am currently working with the Oniguruma engine (Ruby 1.9 /PHP multi-byte string module).

If possible, please elaborate on the solution. Thanks a lot!

CodePudding user response:

If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.

(?<!\S)(?!(?:elem|asdf)?-)[\w-] \b

Explanation

  • (?<!\S) Assert a whitespace boundary to the left
  • (?! Negative lookahead, assert the directly to the right is not
    • (?:elem|asdf)?- Optionally match elem or asdf followed by -
  • ) Close the lookahead
  • [\w-] Match 1 word chars or -
  • \b A word boundary

See a regex demo.

Or a version with a capture group and without a lookbehind:

(?:\s|^)(?!(?:elem|asdf)?-)([\w-] )\b

See another regex demo.

  • Related