Home > Software engineering >  how to do array split with regex?
how to do array split with regex?

Time:11-16

i have a string i need to convert it into a array of object

const str = "addias (brand|type) sneakers(product) for men(o)"

expected output

let output = [
 { 
  key:"addias",
  value:["brand","type"]
 },
{ 
  key:"sneakers",
  value:["product"]
 },
{ 
  key:"for men",
  value:[]
 }

]

code i tried

function gerateSchema(val) {
       let split = val.split(" ")
       let maps = split.map((i) => {
           let obj = i.split("(")
           let key = obj[0].replaceAll(/\s/g, "")
           let cleanValue = obj[1].replace(/[{()}]/g, "")
           let stripedValues = cleanValue.split("|")

           return {
               key: key,
               value: stripedValues,
           }
       })
       return maps

}
let out = gerateSchema(str)

but this breaking when there is some word with space for example for men

how to do split with a regex

CodePudding user response:

One approach would be first do a regex find all to find all key/value combinations in the original string. Then, iterate that result and build out a hashmap using the word keys and the array values.

var str = "addias (brand|type) sneakers(product) for men(o)";
var matches = str.match(/\w (?: \w )*\s*\(.*?\)/g, str);
var array = [];
for (var i=0; i < matches.length;   i) {
    var parts = matches[i].split(/\s*(?=\()/);
    var map = {};
    map["key"] = parts[0];
    map["value"] = parts[1].replace(/^\(|\)$/g, "").split(/\|/);
    array.push(map);
}
console.log(array);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

The first regex matches each key/value string:

\w         match a word
(?: \w )*  followed by a space, and another word, the quantity zero or more times
\s*        optional whitespace
\(         (
.*?        pipe separated value string
\)         )

Then, we split each term on \s*(?=\(), which is the space(s) immediately preceding the (...|...) term. Finally, we split the value string on pipe | to generate the set of values.

CodePudding user response:

An alternative way could be this.

const str = "addias (brand|type) sneakers(product) for men(o)"
const array = str.split(')').filter(i => i.length).map(i => {
   const item = i.split('(');
   return {
     key: item[0].trim(),
     value: item[1].split('|')
   }
})

console.log(array)
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

It may be simpler to use the exec method to iterate over the patterns the regex finds.

const str = 'addias(brand|type|size|color) sneakers(pro) for men(o)';

// The regex looks for an initial group of letters,
// then matches the string inside the parentheses
const regex = /([a-z] )\(([a-z\|] )\)/g;

let myArray;
const arr = [];

while ((myArray = regex.exec(str)) !== null) {

  // Destructure out the key and the delimited string
  const [_,  key, ...rest] = myArray;

  // `split` on the string found in `rest` first element
  const values = rest[0].split('|');

  // Finally push a new object into the output array
  // (removing "o" for whatever reason)
  arr.push({
    key,
    value: values.filter(v => v !== 'o')
  });
}

console.log(arr);
<iframe name="sif3" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

With a little help from regex101.com, derived the following regex expressions and the following code.

([^\(] )\(([^\)]*)\) which breaks down into

([^\(] ) - capture 1 or more chars up to the first ( as group 1

\( - swallow the left parens

([^\)]*) - capture everything up to the next occurrence of ) as group 2

\) - swallow the right parens

and I was starting to [^|] - to parse the text of group 2, but it's actually simpler with a simple split statement.

    function generateSchema(str) {
        const regex = /([^\(] )\(([^\)]*)\)/mg;  // captures the 'word (word)' pattern
        let m;
        let output = [];
        let obj = {};
    
        while ((m = regex.exec(str)) !== null) {
    
            // This is necessary to avoid infinite loops with zero-width matches
            if (m.index === regex.lastIndex) {
                regex.lastIndex  ;
            }
        
            m.forEach((match, groupIndex) => {
                if (groupIndex === 1) {
                    obj = {};
                    obj.key = match.trim();
                } else if (groupIndex === 2) {
                    obj.value = match.split('|').map(i=>i.trim());
                    output.push(obj);
                }
            });
        }
        return output;
    }

    const str = "addidas (brand | type  ) sneakers(product) for men(o)";
    console.log(generateSchema(str));
<iframe name="sif4" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related