Home > other >  Regex match word not immediately preceded by another word but possibly preceded by that word before
Regex match word not immediately preceded by another word but possibly preceded by that word before

Time:06-24

I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:

.*(?<!forbidden)\b(word1|word2|word3)\b.*

that is still matching a sentence like hello forbidden word1 because forbidden is matched by .*. But if I remove the .* I am not anymore matching strings like hello word1, which I want to match.

Note that I want to match a string like forbidden hello word1.

Could you suggest me how to fix this problem?

CodePudding user response:

This one seems to work well :

^.*\b(?!(?:forbidden|word[1-3])\b)\w  (word[1-3]).*$

\b(?!(?:forbidden|word[1-3])\b)\w checks for multiple following words that are not forbidden or word[1-3].

So it matches hi forbidden hello word1 test but not hi hello forbidden word2 test.

CodePudding user response:

If what you want is match entire string. Try this:

Regex test

^(.(?<!forbidden (word1|word2|word3)\b))*((?<!forbidden )\b(word1|word2|word3)\b) (.(?<!forbidden (word1|word2|word3)\b))*$

The knowledge is from this thread Regular expression to match a line that doesn't contain a word

I've just reversed the order of look-around

^(.(?<!forbidden (word1|word2|word3)\b))* to discard any string that has pattern forbidden (word1|word2|word3)

((?<!forbidden )\b(word1|word2|word3)\b) is what you defined

But I just can't understand why do you need this requirement.

CodePudding user response:

Have a look into word boundaries \bword can never touch a word character to the left.

To disallow (word1|word2|word3) if not preceded by forbidden and

  • one non word character \W

    .*?\b(?<!forbidden\W)(word1|word2|word3)\b.*
    

    See this demo at regex101

  • any amount of \W

    .*?(?<!forbidden)(?<!\W)\W*\b(word1|word2|word3)\b.*
    

    Regex101 demo (in multiline demo I used [^\w\n] instead \W for not skipping over lines)

  • Related