I'm trying to break sentences out of a paragraph of text and my current solution works however it doesn't always work. Here's my regex
text.replaceAll(/[^Mr|^mr|^Mrs|^mrs|^Ms|^ms](\.|\?|\!)\s[A-Z]/g, r => r.replace(/\s/, "{break}")).split("{break}");
The way it should work is to find a period and then a space followed by a capital letter except in the case of Mr, Mr, Mrs, mrs, Ms, or ms. It currently does that except when the sentence ends in an s, m, or r. I know this is because the [] matches any character in it, my question is how do I write this so it does what I want to (match the full word, not the individual characters)
An example of a string that fails would be
"A string with words. A new string."
and one that passes
"A string. A new string."
CodePudding user response:
How to split a sentence, without breaking on name titles.
Gotta love regex, you want to group your result to just grab the periods. Then we take what we used as an identifier group, and replace that as a $1 variable with the {break} entry.
let text = "A string with words. Mr. Andrews wrote a new string. It went something like Mrs. Doubtfire's best line. But what if 3.1 people want to sign up for Gecko? What if it ends in a question mark?";
const OPExample = text.replace(/(?<!Mr|Mrs|Ms|Dr|Sr)([\.?\??\!?]) ([A-Z])/gi, "$1{break}$2")
const SplitLines = OPExample.split("{break}");
console.log(OPExample); // "A string with words.{break}Mr. Andrews wrote a new string.{break}It went something like Mrs. Doubtfire's best line.{break}But what if 3.1 people want to sign up for Gheko?{break}What if it ends in a question mark?"
console.log(SplitLines); // ["A string with words.","Mr. Andrews wrote a new string.","It went something like Mrs. Doubtfire's best line.","But what if 3.1 people want to sign up for Gheko?","What if it ends in a question mark?"]