Home > Back-end >  JavaScript: Using regex to find all sentence endpoints in a paragraph
JavaScript: Using regex to find all sentence endpoints in a paragraph

Time:12-23

So, I have seen many solutions on stackoverflow, but, after testing a lot of them, I have found they struggle with this task.

My current regex comes close, but, fails when it crosses Mr. or Mrs.

Current regex pattern:

/(?<![A-Z]\.)(?<=[.!?;])(?=[ A-Z])/

I'd like to either return all the sentences or split on the sentence end with the punctuation.

Test string:

Cool commonly refers to: Cool, a moderately low temperature Cool (aesthetic), an aesthetic of attitude, behavior, and styleCool or COOL may also refer to: Country of origin labelling mCOOL - US consumer legislation to enforce COOL at the grocery store Cool (programming language) COOL, a computer language used in the CLIPS tool Cool, an internal name of C# Cool (Rotterdam), Netherlands Cool, California, U.S. Cool, Texas, U.S. Cool (band), a South Korean K-pop music group Cool jazz Cool (George Duke album) (2000) Lupe Fiasco's The Cool (2007) The Cool (character), the associated concept character Cool (Joyce album) (2015) "Cool" (Alesso song) (2015) "Cool" (Anthony Hamilton song) (2008) "Cool" (Jonas Brothers song) (2019) "Cool" (Le Youth song) (2013) "Cool" (Dua Lipa song) (2020) "Cool" (Gwen Stefani song) (2005) "Cool" (The Time song) (1981), later co 0.0.0.0. (.) vered. By Snoop Dogg and Prince "Cool" (West Side Story song) (1957) Cool (producer), American hip hop producer Fabien Cool (born 1972), French footballer Tré Cool (born 1972), American drummer (Green Day) Wim Cool (born 1943), Dutch politician LL Cool J (born 1968), American rapper CoolTV, a Canadian television channel Cool TV, a U.S. Hungarian television channel "Cool" (Smallville), an episode of Smallville COOL Award, children's book choice award Cool colors, a perc 8.5/10 eptual and psychological classification of colors Cumhall, a figure in Irish mythology Majesco Entertainment' NASDAQ ticker symbol Mr. Cool (Mr. Men), a fictional character in the Mr. Men children's book series Steve McQueen: popularly known as "The King of Cool" Cool Change (disambiguation) Cool Kids (disambiguation) Kool (disambiguation) Mister Cool (disambiguation) All pages with titles beginning with Cool All pages with titles containing Cool. Let's just add some other use cases (here). And "here."

Regex Tester Link https://regex101.com/r/1hqO1I/1

CodePudding user response:

You could extend your pattern using an alternation | to exclude matching Mr. or Mrs. preceded by a word boundary to the left.

(?<!(?:\bMrs?|[A-Z])\.)(?<=[.!?;])(?=[ A-Z])

Explanation

  • (?<! Negative lookbehind
    • (?:\bMrs?|[A-Z])\. Match either Mr. Mrs. or a single char A-Z followed by a dot
  • ) Close the lookbehind
  • (?<=[.!?;]) Positive lookbehind, assert one of . ! ? ; to the left
  • (?=[ A-Z]) Positive lookahead, assert either or a char A-Z to the right

Regex demo

  • Related