Attempting to match the entire sentence of a document containing certain words even if the sentence spans multiple lines.
My current attempts only capture the sentence if it does not span to the next lines.
^.*\b(dog|cat|bird)\b.*\.
Using ECMAScript.
CodePudding user response:
When no abbreviations in the input are expected use
[^?!.\s][^?!.]*?\b(dog|cat|bird)\b[^?!.]*[.?!]
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[^?!.\s] any character except: '?', '!', '.',
whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
[^?!.]*? any character except: '?', '!', '.' (0 or
more times (matching the least amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
dog 'dog'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
cat 'cat'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
bird 'bird'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
[^?!.]* any character except: '?', '!', '.' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.?!] any character of: '.', '?', '!'