Home > Blockchain >  Javascript/RegEx: Split a string by commas but ignore commas within double-quotes
Javascript/RegEx: Split a string by commas but ignore commas within double-quotes

Time:07-26

I know similar questions are available but I could not find this case.

CASE 1: 'a,b,c,d,e'

OUTPUT: ["a", "b", "c", "d", "e"]

CASE 2: 'a,b,"c,d", e'

OUTPUT: ["a", "b", "c,d", "e"]

CASE 3: 'a,,"c,d", e'

OUTPUT: ["a", "", "c,d", "e"]

RegEx that I tried: (".*?"|[^",] )(?=\s*,|\s*$)

RegEx Link: https://regex101.com/r/xImG4i/1

This regex works well with CASE1 and CASE2 But is failing for CASE3. Insead it works for

'a, ,"c,d", e', giving output as ["a", " ", "c,d", "e"]

which is also fine but need to work for CASE3 also.

Thanks in advance!

CodePudding user response:

You might take optional whitespace chars between 2 comma's if a lookbehind is supported.

"[^"]*"|[^\s,'"] (?:\s [^\s,'"] )*|(?<=,)\s*(?=,)

Regex demo

const regex = /"[^"]*"|[^\s,'"] (?:\s [^\s,'"] )*|(?<=,)\s*(?=,)/g;

[
  `'a,b,c,d,e'`,
  `'a,b,"c,d", e'`,
  `'a,,"c,d", e'`,
  ` xz a,, b, c, "d, e, f", g, h`,
  `'a, ,"c,d", e'`,
].forEach(s => 
  console.log(s.match(regex))
)

If you don't want the double quotes you can use a capture group with matchAll and check for the group in the callback.

const regex = /"([^"]*)"|[^\s,'"] (?:\s [^\s,'"] )*|(?<=,)\s*(?=,)/g;

[
  `'a,b,c,d,e'`,
  `'a,b,"c,d", e'`,
  `'a,,"c,d", e'`,
  ` xz a,, b, c, "d, e, f", g, h`,
  `'a, ,"c,d", e'`,
].forEach(s =>
  console.log(Array.from(s.matchAll(regex), m => m[1] ? m[1] : m[0]))
)

CodePudding user response:

An alternate solution that uses a regex for splitting instead of matching:

/,\s*(?=(?:(?:[^"]*"){2})*[^"]*$)/

This regex will split on comma followed by optional spaces if those are outside double quotes by using a lookahead to make sure there are even number of quotes after comma space.

RegEx Demo

Code Sample:

const re = /,\s*(?=(?:(?:[^"]*"){2})*[^"]*$)/;

[
  `a,b,"c,d", e`,
  `a,,"c,d", e`,
  ` xz a,, b, c, "d, e, f", g, h`,
  `a, ,"c,d", e`,
].forEach(s => {
  tok = s.split(re);
  tok.forEach((e, i) => tok[i] = e.replace(/^"|"$/g, ''))
  
  console.log(s, '::', tok);
})

  • Related