Split any html element separately using regex in JavaScript split-CodePudding

what I'm trying to achieve is when I split the inner contents of an element, I get each item seperate, but the html element needs to be 1 element in the split.

For example:

<p id="name1" > We deliver
      <span >software</span> &
      <span >websites</span> for your organization<span >.</span>
</p>

Like in the example above, I want to make anything inside the <span> 1 array item after splitting the inner contents of #name1.

So in other words, I want the split array to look like this:

[
 'we',
 'deliver',
 '<span >software</span>',
 '&',
 '<span >websites</span>'

 ... etc.
]

Currently this is what I have. But this does not work since it ignores the text inside of the html element and therefore splits it halfway through the element. I would also like it to be any html element, and not just limited to <span>.

let sentence = el.innerHTML; // el being #name1 in this case
let words = sentence.split(/\s(?=<span)/i);

How would I be able to achieve this with regex? Is this possible? Thank you for any help.

CodePudding user response：

Here is a DOMParser based solution which parses the HTML and then iterates over the top node's children, pushing the HTML into the result array if the node is an element, or splitting the text on space (if it is a text element) and adding those values to the result array:

const html = `<p id="name1" > We deliver
      <span >software</span> &
      <span >websites</span> for your organization<span >.</span>
</p>`

const parser = new DOMParser();
const s = parser.parseFromString(html, 'text/html');
let result = [];
for (el of s.body.firstChild.childNodes) {
  if (el.nodeType == 3 /* TEXT_NODE */ ) {
    result = result.concat(el.nodeValue.trim().split(' ').filter(Boolean));
  } 
  else if (el.nodeType == 1 /* ELEMENT_NODE */ ) {
    result.push(el.outerHTML);
  }
}

console.log(result);

CodePudding user response：

Details are commented in example below

const nodeSplitter = (mainNode) => {
  let scan;
  /*
  Check if initial node has text or elements
  */
  if (mainNode.hasChildNodes) {
    scan =
      /*
      Collect all elements, text, and comments
      into an array
      */
      Array.from(mainNode.childNodes)
      /*
      If node is an element, return it...
      ...if node is text, use `.matchAll()` to
      find each word and add to array...
      .filter() any falsy values and flatten
      the array and then return it
      */
      .flatMap(node => {
        if (node.nodeType === 1) {
          return node;
        } else if (node.nodeType === 3) {
          const rgx = new RegExp(/[\w\\\-\.\]\&] /, 'g');
          let strings = [...node.textContent.matchAll(rgx)]
            .filter(node => node).flat()
          return strings;
        } else {
          /*
          Otherwise, return empty array which is 
          basically nothing since .flatMap() 
          flattens an array as default
          */
          return [];
        }
      });
  } else {
    // Return if mainNode is empty
    return;
  }
  // return results
  return scan;
}

const main = document.getElementById('name1');
console.log(nodeSplitter(main));

<p id="name1" > We deliver
  <span >software</span> &
  <span >websites</span> for your organization
  <span >.</span>
</p>