I am trying to convert a numbered list into an array of items. This is what I have so far:
let input = `1. This is a text\n where each item can span over multiple lines\n 1.1 this is another item\n 1.2another item\n 2. that I want to\n extract each seperate\n item from\n 3. How can I do that?`;
let regex = /(\d \.\d |\d )\s(.*)/g;
let matches = input.match(regex);
console.log(matches);
This only produces the following output:
"1.1 this is another item"
What I would like is something like this:
"1. This is a text"
"1.1 this is another item"
"1.2another item"
...and so on
Why is it matching only one item out of this string? What am I doing wrong and how can I fix it?
CodePudding user response:
Your regex does not foresee a dot after a number when there is no second number following it. It also requires a space after the number, but you have a case where there is no such space. So make it optional.
Also, use the s
modified so .
also matches newline cha
If a new item can start on the same line, you'll need a look-ahead to foresee where a match must end.
Correction:
let input = `1. This is a text\n where each item can span over multiple lines\n 1.1 this is another item\n 1.2another item\n 2. that I want to\n extract each seperate\n item from\n 3. How can I do that?`;
let regex = /(\d \.\d*)\s?(.*?)(?=\d \.|$)/gs;
let matches = input.match(regex);
console.log(matches);
CodePudding user response:
Another option using a negated character class:
\b\d \.\D*(?:\d(?!\.)[^.]*)*
Explanation
\b\d \.\
A word boundary, match 1 digits and a dot\D*
Optionally match non digits(?:\d(?!\.)[^.]*)*
Optionally match a digit asserting not a dot directly to the right
let input = `1. This is a text\n where each item can span over multiple lines\n 1.1 this is another item\n 1.2another item\n 2. that I want to\n extract each seperate\n item from\n 3. How can I do that?`;
let regex = /\b\d \.\D*(?:\d(?!\.)[^.]*)*/g;
let matches = input.match(regex);
console.log(matches);
If you want to keep the start of the string into account where the digits and the dot start, you can follow the match by asserting not a digit and dot pattern at the start of the string:
^[^\S\n]*\d \..*(?:\n(?![^\S\n]*\d \.).*)*
let input = "1. This is a text with a number 1.2 and 3.\n where each item can span over multiple lines\n 1.1 this is another item\n 1.2another item\n 2. that I want to\n extract each seperate\n item from\n 3. How can I do that?";
let regex = /^[^\S\n]*\d \..*(?:\n(?![^\S\n]*\d \.).*)*/gm;
let matches = input.match(regex).map(s => s.trim());
console.log(matches);