Home > Net >  JS split string on positive lookahead, avoid overlapping cases
JS split string on positive lookahead, avoid overlapping cases

Time:07-22

I have a set of data that includes dated notes, all concatenated, as in the example below. Assume the date always comes at the beginning of its note. I'd like to split these into individual notes. I've used a positive lookahead so I can keep the delimiter (the date).

Here's what I'm doing:

  const notes = "[3/28- A note. 3/25- Another note. 3/24- More text. 10/19- further notes. [10/18- Some more text.]"
  const pattern = /(?=\d{1,2}\/\d{1,2}[- ] )/g
  console.log(notes.split(pattern))

and the result is

[ '[',
  '3/28- A note. ',
  '3/25- Another note. ',
  '3/24- More text. ',
  '1',
  '0/19- further notes. [',
  '1',
  '0/18- Some more text.]' ]

The pattern \d{1,2} matches both 10/19 and 0/19 so it splits before both of those. Instead I'd like to have

[ '[',
  '3/28- A note. ',
  '3/25- Another note. ',
  '3/24- More text. ',
  '10/19- further notes. [',
  '10/18- Some more text.]' ]

(I can handle the extraneous brackets later.)

How can I accomplish this split with regex or any other technique?

CodePudding user response:

To get your wanted output, you can prepend a word boundary in the lookahead, and you can omit the plus sign at the end of the pattern.

(?=\b\d{1,2}\/\d{1,2}[- ])

Regex demo

const notes = "[3/28- A note. 3/25- Another note. 3/24- More text. 10/19- further notes. [10/18- Some more text.]"
const pattern = /(?=\b\d{1,2}\/\d{1,2}[- ])/g
console.log(notes.split(pattern))

CodePudding user response:

I would avoid split() here and instead use match():

var notes = "[3/28- A note. 3/25- Another note. 3/24- More text. 10/19- further notes. [10/18- Some more text.]";
var matches = notes.match(/\[?\d \/\d \s*-\s*.*?\.\]?/g);
console.log(matches);

You may do a further cleanup of leading/trailing brackets using regex, e.g.

var input = "[10/18- Some more text.]";
var output = input.replace(/^\[|\]$/, "");

CodePudding user response:

Try .replaceAll() and this regex:

/(\[?\d{1,2}. ?\.)/
// Replacement
"\n$1"

Figure I - Regex

Segment Description
(\[? Begin capture group - match literal "[" zero or one time
\d{1,2} match a digit one or two times
. ?\.) match anything one to any number of times, stop after matching a literal "." - end capture group

Figure II - Replacement

Segment Description
\n New line
$1 Everything matched in the capture group (...)

const notes = "[3/28- A note. 3/25- Another note. 3/24- More text. 10/19- further notes. [10/18- Some more text.]";
const rgx = new RegExp(/(\[?\d{1,2}. ?\.)/, 'g');
let result = notes.replaceAll(rgx, "\n$1");

console.log(result);

  • Related