I have this text:
const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west."
And I need to split it by a dot (and keep the dot in the sentence), but I only need to split it by one dot, so I want the following array as a result:
const result = [
"If you look at a map of Europe... you will... notice.",
"That apart from the big... landmass known as the continent, there are two small islands to the west."
]
I have already created a function that can split sentences by a dot, question mark, and exclamation mark, but it doesn't work properly when there are triple dots in the sentence.
function splitByPunctuationMark(str) {
return str.split(/(?<=[!.?])/).map(value => value.trim())
}
UPDATED
splitByPunctuationMark()
gives me the following result:
#1 When there are no triple dots in the source text
const result = splitByPunctuationMark("If you look at a map of Europe you will notice. That apart from the big landmass known as the continent, there are two small islands to the west.")
console.log(result)
/*
[
"If you look at a map of Europe you will notice.",
"That apart from the big landmass known as the continent, there are two small islands to the west."
]
*/
#2 When there are triple dots in the source text
const result = splitByPunctuationMark("If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.")
console.log(result)
/*
[
"If you look at a map of Europe.",
".",
".",
"you will.",
".",
".",
"notice.",
"That apart from the big.",
".",
".",
"landmass known as the continent, there are two small islands to the west."
]
*/
CodePudding user response:
Here is one way of doing it. At first I make all "..." disappear by converting them to a separator string. This string needs to be chosen carefully, so it won't be found anywhere in the target string. After splitting at the remaining single "." I then replace the "..." back into their original positions.
const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.And here is a third ... is it a sentence? And a forth!"
const sep="@threedots@",res=text.replaceAll("...",sep).split(/(?<=[.!?])\s*/).map(e=>e.replaceAll(sep,"..."));
console.log(res);
In order to preserve the "." at the end of each sentence I used a lookbehind in the regular expression: /(?<=[.!?])\s*/
. This will consider 0...n whitespace characters as separator patterns, if they occur immediately after a ".", an "!" or after a "?".
@Martin Niederl quite rightly remarked about the possibility of any number of repeated dots occuring. He presented a solution which I also consider helpful. Here is my take on it (allowing also for other end-of-sentence characters):
const text = "If you look at a map of Europe..... you will... notice. That apart from the big.. landmass known as the continent, there are two small islands to the west.And here is a third .... Is it a sentence? And a forth!"
const res=text.split(/(?<=(?<!\.)[.!?](?!\.))\s*/);
console.log(res);
I now have a positive lookbehind that contains a pattern consisting of a negative lookbehind of a ".", followed by exactly one of the characters ".", "!" or "?" and another negative lookahead of another ".". Immediately after the positive lookbehind I demand a sequence of 0 to any number of whitespace characters.
CodePudding user response:
/(?<!\.)\.(?!\.)/
matches only individual dots.
You could then split it based on that, remove empty sentences, trim the sentences and reapply the stripped dot:
const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west."
let sentences = text.split(/(?<!\.)\.(?!\.)/)
.filter(sentence => sentence.length > 0)
.map(sentence => sentence.trim() ".")
console.log(sentences)
CodePudding user response:
To split the text by one dot and keep the dot in the sentence, you can use a regular expression that looks for a dot preceded by a space or the start of the string, and followed by a space or the end of the string. Here is an example:
const text =
"If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.";
const result = text.split(/(?<= |^)\.(?= |$)/).map(value => value.trim());
console.log(result);
This regular expression uses a positive lookbehind ((?<= )) to check for a space or the start of the string before the dot, and a positive lookahead ((?= |$)) to check for a space or the end of the string after the dot.
CodePudding user response:
Maybe something like:
(.*\w\.\s)|(.*\w\.)
(.*\w.\s) - matches any string ending with a letter followed by a dot followed by a space.
OR
(.*\w.) - matches any string ending with a letter followed by a dot.