Utilizing REGEX pattern:
[^?!.\s][^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]
to match an entire sentence with the above-included words, even if the sentence spans multiple lines.
However, I've found that if the word of interest is the first in the sentence, it will not match.
For example: The bird is dead. Will Match. Dog days are over. Will Not. Often the sentences I'm looking for are incomplete grammatically as the second listed, but follow a beginning capitalization and followed by period structure.
CodePudding user response:
You can use
(?=\s)[^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]
\b[^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]
In the first regex, the first matched char MUST be a non-whitespace char because the (?=\s)
is a positive lookahead that matches a location that is immediately followed with a whitespace char.
The \b
in the second variant is more specific and matches a position between a start of string/non-word char and a word char, or between a word char and a non-word char/end of string.
Note that in JavaScript \b
word boundary is not Unicode-aware, and if you need full Unicode word boundary support, you will need a workaround, see Replace certain arabic words in text string using Javascript.