Home > front end >  Regex to match text between two words where a third word is not found between them
Regex to match text between two words where a third word is not found between them

Time:01-20

I am trying to write a regular expression to identify a block of text between two words, providing a specific word is not found within that block of text.

I have been able to use a negative lookahead to partly achieve this, however when the word I do not want to match appears later in the same line of text outside of the area that should be matched, the match that should occur earlier in the line is not found.

For this example the start word will be cat the end word will be mat and the word I do not want to have between those two words will be dog.

Given these three lines:

The cat sat on the mat.
The cat and the dog sat on the mat.
The cat sat on the mat. The dog sat on the mat.

I want to match the following:

cat sat on the mat (from the first line)
cat sat on the mat (from the third line)

The regex I have so far is cat(?!. ?dog). ?mat which produces the following:

cat sat on the mat (from the first line only)

I want to change the regex to return cat sat on the mat from the third line also.

CodePudding user response:

You may use a tempered dot here:

\bcat\b(?:(?!\bdog\b).)*?\bmat\b

Demo

This pattern says to match:

\bcat\b             match "cat"
(?:(?!\bdog\b).)*?  match any content WITHOUT crossing "dog"
\bmat\b             until matching the nearest "mat"

CodePudding user response:

You can also use a negative character class to disallow . to occur in-between cat and mat:

cat(?![^.] dog). ?mat
  •  Tags:  
  • Related