Home > Back-end >  Javascript: Regex to convert a numbered list string to an array of items
Javascript: Regex to convert a numbered list string to an array of items

Time:07-15

I am trying to convert a numbered list into an array of items. This is what I have so far:

let input = `1. This is a text\n    where each item can span over multiple lines\n  1.1 this is another item\n 1.2another item\n  2. that I want to\n    extract each seperate\n    item from\n    3. How can I do that?`;

let regex = /(\d \.\d |\d )\s(.*)/g;
let matches = input.match(regex);
console.log(matches);

This only produces the following output:

"1.1 this is another item"

What I would like is something like this:

"1. This is a text"
"1.1 this is another item"
"1.2another item"
...and so on

Why is it matching only one item out of this string? What am I doing wrong and how can I fix it?

CodePudding user response:

Your regex does not foresee a dot after a number when there is no second number following it. It also requires a space after the number, but you have a case where there is no such space. So make it optional.

Also, use the s modified so . also matches newline cha

If a new item can start on the same line, you'll need a look-ahead to foresee where a match must end.

Correction:

let input = `1. This is a text\n    where each item can span over multiple lines\n  1.1 this is another item\n 1.2another item\n  2. that I want to\n    extract each seperate\n    item from\n    3. How can I do that?`;

let regex = /(\d \.\d*)\s?(.*?)(?=\d \.|$)/gs;
let matches = input.match(regex);
console.log(matches);

CodePudding user response:

Another option using a negated character class:

\b\d \.\D*(?:\d(?!\.)[^.]*)*

Explanation

  • \b\d \.\ A word boundary, match 1 digits and a dot
  • \D* Optionally match non digits
  • (?:\d(?!\.)[^.]*)* Optionally match a digit asserting not a dot directly to the right

Regex demo

let input = `1. This is a text\n    where each item can span over multiple lines\n  1.1 this is another item\n 1.2another item\n  2. that I want to\n    extract each seperate\n    item from\n    3. How can I do that?`;

let regex = /\b\d \.\D*(?:\d(?!\.)[^.]*)*/g;
let matches = input.match(regex);
console.log(matches);

If you want to keep the start of the string into account where the digits and the dot start, you can follow the match by asserting not a digit and dot pattern at the start of the string:

^[^\S\n]*\d \..*(?:\n(?![^\S\n]*\d \.).*)*

Regex demo

let input = "1. This is a text with a number 1.2 and 3.\n    where each item can span over multiple lines\n  1.1 this is another item\n 1.2another item\n  2. that I want to\n    extract each seperate\n    item from\n    3. How can I do that?";

let regex = /^[^\S\n]*\d \..*(?:\n(?![^\S\n]*\d \.).*)*/gm;
let matches = input.match(regex).map(s => s.trim());
console.log(matches);

  • Related