Home > Software engineering >  How can I include the delimiter with regex String.split()?
How can I include the delimiter with regex String.split()?

Time:05-04

I need to parse the tokens from a GS1 UDI format string:

"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"

I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:

[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]

Below is a JSFiddle with examples, but in case you want to see it right here:

//  This includes the delimiter match in the results, but I want the delimiter included WITH the value
//  after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
//  Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]

//  If I include a pattern that should (I think) match the content following the delimiter I will 
//  only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W )/).filter(Boolean))
//  Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]

//  I think this is because I'm effectively mathching the entire string, hence a single result.
//  So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
//  Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]

I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.

I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/

I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.

CodePudding user response:

You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:

const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d \))/)
console.log(parts)

CodePudding user response:

Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.

(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)

const re = /(\(\d \)[\d\-A-Z] )/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';

console.log(str.match(re));

  • Related