Regex to convert CSS selector into blocks-CodePudding

I would like to create an Array out of this string:

// string
'a b[text="Fly to San Fran",text2="More"] c foo[text=Fly to San Fran,text2=More] bar d'

// resulting array:
[
    'a',
    'b[t="Fly to San Fran",text2=More]',
    'c',
    'foo[t=Fly to San Fran,text2=More]',
    'bar',
    'd'
]

How would a regex look like to split the string or is this the wrong approach?

So far I tried the following, which results in way too many null values.

/([a-zA-Z]*\[[a-z]*\])|([\w]*)/g

=>
[
   'a',
   null,
   'b[t="Fly to San Fran",text2=More]',
   null,
   'c',
   null
   'foo',
   null,
   [t=Fly to San Fran,text2=More]',
   null,
   'bar',
   null,
   'd'
]

CodePudding user response：

\[[a-z]*\] matches only letters within [...]. But there occur " = , and spaces. Better use negation \[[^\]\[]*\] here. This would match any character that is neither [ nor ] inside.

const s = `a b[text="Fly to San Fran",text2="More"] `  
          `c foo[text=Fly to San Fran,text2=More] bar d`;

let res = s.match(/[a-zA-Z]*\[[^\]\[]*\]|\w /g);

res.forEach(element => console.log(element));

CodePudding user response：

Using a regex in this case would be extremely tedious.

I would recommend using a CSS selector parsing library.

In the following example, we can use the parsel library to tokenize the selector, then reduce over the tokens to combine the adjacent ones.

const str = `a b[text="Fly to San Fran",text2="More"] c foo[text=Fly to San Fran,text2=More] bar d`
const tokens = parsel.tokenize(str).map(e => e.content)
const res = tokens.slice(1).reduce((acc, curr) => {
  const prev = acc[acc.length - 1]
  return curr == " " || prev == " " ? acc.push(curr) : acc[acc.length - 1]  = curr, acc
}, [tokens[0]]).filter(e => e != " ")
console.log(res)

<script src="https://projects.verou.me/parsel/dist/nomodule/parsel.js"></script>