I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:
.*(?<!forbidden)\b(word1|word2|word3)\b.*
that is still matching a sentence like hello forbidden word1
because forbidden
is matched by .*
. But if I remove the .*
I am not anymore matching strings like hello word1
, which I want to match.
Note that I want to match a string like forbidden hello word1
.
Could you suggest me how to fix this problem?
CodePudding user response:
This one seems to work well :
^.*\b(?!(?:forbidden|word[1-3])\b)\w (word[1-3]).*$
\b(?!(?:forbidden|word[1-3])\b)\w
checks for multiple following words that are not forbidden
or word[1-3]
.
So it matches hi forbidden hello word1 test
but not hi hello forbidden word2 test
.
CodePudding user response:
If what you want is match entire string. Try this:
^(.(?<!forbidden (word1|word2|word3)\b))*((?<!forbidden )\b(word1|word2|word3)\b) (.(?<!forbidden (word1|word2|word3)\b))*$
The knowledge is from this thread Regular expression to match a line that doesn't contain a word
I've just reversed the order of look-around
^(.(?<!forbidden (word1|word2|word3)\b))*
to discard any string that has pattern forbidden (word1|word2|word3)
((?<!forbidden )\b(word1|word2|word3)\b)
is what you defined
But I just can't understand why do you need this requirement.
CodePudding user response:
Have a look into word boundaries \bword
can never touch a word character to the left.
To disallow (word1|word2|word3)
if not preceded by forbidden
and
one non word character
\W
.*?\b(?<!forbidden\W)(word1|word2|word3)\b.*
any amount of
\W
.*?(?<!forbidden)(?<!\W)\W*\b(word1|word2|word3)\b.*
Regex101 demo (in multiline demo I used
[^\w\n]
instead\W
for not skipping over lines)