Home > Net >  How to split text only by one dot, but not by triple dots?
How to split text only by one dot, but not by triple dots?

Time:12-28

I have this text:

const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west."

And I need to split it by a dot (and keep the dot in the sentence), but I only need to split it by one dot, so I want the following array as a result:

const result = [
 "If you look at a map of Europe... you will... notice.",
 "That apart from the big... landmass known as the continent, there are two small islands to the west."
]

I have already created a function that can split sentences by a dot, question mark, and exclamation mark, but it doesn't work properly when there are triple dots in the sentence.

function splitByPunctuationMark(str) {
 return str.split(/(?<=[!.?])/).map(value => value.trim())
}

UPDATED

splitByPunctuationMark() gives me the following result:

#1 When there are no triple dots in the source text

const result = splitByPunctuationMark("If you look at a map of Europe you will notice. That apart from the big landmass known as the continent, there are two small islands to the west.")

console.log(result)
/*
[
 "If you look at a map of Europe you will notice.", 
 "That apart from the big landmass known as the continent, there are two small islands to the west."
]
*/

#2 When there are triple dots in the source text

const result = splitByPunctuationMark("If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.")

console.log(result)
/*
[
 "If you look at a map of Europe.", 
 ".", 
 ".", 
 "you will.", 
 ".", 
 ".", 
 "notice.", 
 "That apart from the big.", 
 ".", 
 ".", 
 "landmass known as the continent, there are two small islands to the west."
]
*/

CodePudding user response:

Here is one way of doing it. At first I make all "..." disappear by converting them to a separator string. This string needs to be chosen carefully, so it won't be found anywhere in the target string. After splitting at the remaining single "." I then replace the "..." back into their original positions.

const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.And here is a third ... is it a sentence?   And a forth!"


const sep="@threedots@",res=text.replaceAll("...",sep).split(/(?<=[.!?])\s*/).map(e=>e.replaceAll(sep,"..."));

console.log(res);

In order to preserve the "." at the end of each sentence I used a lookbehind in the regular expression: /(?<=[.!?])\s*/. This will consider 0...n whitespace characters as separator patterns, if they occur immediately after a ".", an "!" or after a "?".

@Martin Niederl quite rightly remarked about the possibility of any number of repeated dots occuring. He presented a solution which I also consider helpful. Here is my take on it (allowing also for other end-of-sentence characters):

const text = "If you look at a map of Europe..... you will... notice. That apart from the big.. landmass known as the continent, there are two small islands to the west.And here is a third .... Is it a sentence?   And a forth!"
const res=text.split(/(?<=(?<!\.)[.!?](?!\.))\s*/);
console.log(res);

I now have a positive lookbehind that contains a pattern consisting of a negative lookbehind of a ".", followed by exactly one of the characters ".", "!" or "?" and another negative lookahead of another ".". Immediately after the positive lookbehind I demand a sequence of 0 to any number of whitespace characters.

CodePudding user response:

/(?<!\.)\.(?!\.)/ matches only individual dots. You could then split it based on that, remove empty sentences, trim the sentences and reapply the stripped dot:

const text = "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west."

let sentences = text.split(/(?<!\.)\.(?!\.)/)
                    .filter(sentence => sentence.length > 0)
                    .map(sentence => sentence.trim()   ".")
                    
console.log(sentences)

CodePudding user response:

To split the text by one dot and keep the dot in the sentence, you can use a regular expression that looks for a dot preceded by a space or the start of the string, and followed by a space or the end of the string. Here is an example:

const text =
      "If you look at a map of Europe... you will... notice. That apart from the big... landmass known as the continent, there are two small islands to the west.";

    const result = text.split(/(?<= |^)\.(?= |$)/).map(value => value.trim());

    console.log(result);

This regular expression uses a positive lookbehind ((?<= )) to check for a space or the start of the string before the dot, and a positive lookahead ((?= |$)) to check for a space or the end of the string after the dot.

CodePudding user response:

Maybe something like:

(.*\w\.\s)|(.*\w\.)

(.*\w.\s) - matches any string ending with a letter followed by a dot followed by a space.

OR

(.*\w.) - matches any string ending with a letter followed by a dot.

https://regex101.com/r/pk68Ix/1

  • Related