For example, I want to match all strings that contain the word 'cat' or 'dog' such as concatenation, doghouse, underdog, catastrophe, or endogamy. But I want to exclude the words dogs or cats from being matched. I tried this task using the following regex.
\\w*(cat|dog)(s(?=\w ))*\
But this regex doesn't help me select whatever is after the s. Is there some other way to achieve this? Any help is appreciated.
CodePudding user response:
This regex should work
const regex = /\b(?!cats|dogs)[a-z]*(?:cat|dog)[a-z]*\b/gi;
const m = 'concatenation, doghouse, underdog, catastrophe, endogamy should match, but not cats and dogs.'.match(regex);
console.log(m);
Output:
[
"concatenation",
"doghouse",
"underdog",
"catastrophe",
"endogamy"
]
Explanation of regex:
\b
-- word boundary(?!cats\b|dogs\b)
-- negative lookahead for justcats
ordogs
[a-z]*
-- optional alpha chars(?:cat|dog)
-- non-capture group for literalcat
ordog
[a-z]*
-- optional alpha chars\b
-- word boundary
CodePudding user response:
If you also don't want to match dogsdogs
you might write the pattern as:
\b(?!\w*(?:cats\b|dogs\b))\w*(?:cat|dog)\w*
The pattern matches:
\b
a word boundary(?!
Negative lookahead, assert that to the right is not\w*(?:cats\b|dogs\b)
Match optional word characters followed by the word cat or dog followed by a word boundary
)
Close the lookahead\w*(?:cat|dog)\w*
Match cat or dot between word characters
If a lookbehind assertion is supported, and you also want to allow other non whitespace characters, you can use \S
to match a non whitespace character instead of \w
that matches a word character.
(?<!\S)(?!\S*(?:cats\b|dogs\b))\S*(?:cat|dog)\S*
See another Regex demo
CodePudding user response:
I understand your requirements as: match everything that has cat/dog anywhere in word apart from the specific words 'cats' and 'dogs'
\b(?!cats\b|dogs\b)(?=\S*cat\S*|\S*dog\S*)\S*\b
(very) Rough human translation: Find a point where a word isn't cats or dogs (ending with word boundary) and then find a point where a word has cat or dog (either at start, middle, or end) then match everything till the end of the word from that point
Note: flavour - PCRE2