Home > Blockchain >  REGEX: Match strings that contain a specific word but not strings that merely contain that word
REGEX: Match strings that contain a specific word but not strings that merely contain that word

Time:11-19

For example, I want to match all strings that contain the word 'cat' or 'dog' such as concatenation, doghouse, underdog, catastrophe, or endogamy. But I want to exclude the words dogs or cats from being matched. I tried this task using the following regex.

\\w*(cat|dog)(s(?=\w ))*\

But this regex doesn't help me select whatever is after the s. Is there some other way to achieve this? Any help is appreciated.

CodePudding user response:

This regex should work

const regex = /\b(?!cats|dogs)[a-z]*(?:cat|dog)[a-z]*\b/gi;
const m = 'concatenation, doghouse, underdog, catastrophe, endogamy should match, but not cats and dogs.'.match(regex);
console.log(m);

Output:

[
  "concatenation",
  "doghouse",
  "underdog",
  "catastrophe",
  "endogamy"
]

Explanation of regex:

  • \b -- word boundary
  • (?!cats\b|dogs\b) -- negative lookahead for just cats or dogs
  • [a-z]* -- optional alpha chars
  • (?:cat|dog) -- non-capture group for literal cat or dog
  • [a-z]* -- optional alpha chars
  • \b -- word boundary

CodePudding user response:

If you also don't want to match dogsdogs you might write the pattern as:

\b(?!\w*(?:cats\b|dogs\b))\w*(?:cat|dog)\w*

The pattern matches:

  • \b a word boundary
  • (?! Negative lookahead, assert that to the right is not
    • \w*(?:cats\b|dogs\b) Match optional word characters followed by the word cat or dog followed by a word boundary
  • ) Close the lookahead
  • \w*(?:cat|dog)\w* Match cat or dot between word characters

Regex demo


If a lookbehind assertion is supported, and you also want to allow other non whitespace characters, you can use \S to match a non whitespace character instead of \w that matches a word character.

(?<!\S)(?!\S*(?:cats\b|dogs\b))\S*(?:cat|dog)\S*

See another Regex demo

CodePudding user response:

I understand your requirements as: match everything that has cat/dog anywhere in word apart from the specific words 'cats' and 'dogs'

  \b(?!cats\b|dogs\b)(?=\S*cat\S*|\S*dog\S*)\S*\b

(very) Rough human translation: Find a point where a word isn't cats or dogs (ending with word boundary) and then find a point where a word has cat or dog (either at start, middle, or end) then match everything till the end of the word from that point

Note: flavour - PCRE2

  • Related