Home > Back-end >  Matching an entire sentence containing words even if the sentence spans multiple lines
Matching an entire sentence containing words even if the sentence spans multiple lines

Time:12-06

Attempting to match the entire sentence of a document containing certain words even if the sentence spans multiple lines.

My current attempts only capture the sentence if it does not span to the next lines.

^.*\b(dog|cat|bird)\b.*\.

Using ECMAScript.

CodePudding user response:

When no abbreviations in the input are expected use

[^?!.\s][^?!.]*?\b(dog|cat|bird)\b[^?!.]*[.?!]

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  [^?!.\s]                 any character except: '?', '!', '.',
                           whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  [^?!.]*?                 any character except: '?', '!', '.' (0 or
                           more times (matching the least amount
                           possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    dog                      'dog'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    cat                      'cat'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    bird                     'bird'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  [^?!.]*                  any character except: '?', '!', '.' (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [.?!]                    any character of: '.', '?', '!'
  • Related