Home > front end >  How to parse XML and serialize it again in a custom way which shows empty elements as expanded tags?
How to parse XML and serialize it again in a custom way which shows empty elements as expanded tags?

Time:12-21

I have the following which can be dynamic

const data = '<users>
 <user>
  <firstname>test</firstname>
  <lastname />
  <age />
 </user>
</users>'

I want to find every instance where its closed with a /> and expand e.g <age /> becomes <age></age>

obviously I can do this with named tags e.g

data = data.replaceAll("age />", "age></age>")

but how do I do this globally for tags which I dont know the name?

CodePudding user response:

From the above comments ...

"The OP could create an XMLDocument via DOMParser.parseFromString like e.g. const doc = (new DOMParser).parseFromString(data, 'application/xml');. But then the OP needs to write own serializing functionality since the XMLSerializer's serializeToString method will serialize each empty node into its empty tag representation."

The next provided example code comes with a custom serializer in order to prove that the suggested approach can be implemented with not that much effort.

const data = `<users lang="en">
  <user uuid="1232-3965-6923-2887">
    <firstname>test</firstname>
    <lastname/>
    <age type="integer"/>
  </user>
</users>`;

const xmlDoc = (new DOMParser).parseFromString(data, 'application/xml');

const xmlSerializerMarkup = (new XMLSerializer).serializeToString(xmlDoc);
const documentElementMarkup = xmlDoc.documentElement.outerHTML;

console.log('originally provided markup...\n', data);

console.log('XMLSerializer markup...\n', xmlSerializerMarkup);
console.log('documentElement markup...\n', documentElementMarkup);

function serialize(elmNode, indention = '') {
  const { nodeName, childElementCount } = elmNode;

  let currentMarkup = elmNode
    .getAttributeNames()
    .reduce((markup, attrName) => [
        markup,
        ' ',
        attrName,
        '="',
        elmNode.getAttribute(attrName),
        '"',
    ].join(''), `${ indention }<${ nodeName }`);

  if (childElementCount === 0) {
    const { textContent } = elmNode;

    if (textContent === '') {
      // entirely empty element node.

      // do not implement/support the empty tag style.
      currentMarkup = currentMarkup   `><\/${ nodeName }>`;
    } else {
      currentMarkup = [
        currentMarkup,
        '>',
        textContent,
        `<\/${ nodeName }>`,
      ].join('');
    }
  } else {
    let nestedMarkup = '';

    [...elmNode.children]
      .forEach(childElement => {
        nestedMarkup = [
          nestedMarkup,
          serialize(childElement, (indention   '  ')),
        ].join('\n');
      });

    currentMarkup = [
      currentMarkup,
      nestedMarkup,
    ].join('>')   `\n${ indention }<\/${ nodeName }>`;
  }
  return currentMarkup;
}
const customSerializerMarkup = serialize(xmlDoc.documentElement);

console.log('custom serializer markup...\n', customSerializerMarkup);
.as-console-wrapper { min-height: 100%!important; top: 0; }

CodePudding user response:

Would you mind using/adapting something like JsStAX (full disclosure: a little portation project of mine to make an implementation of Java's Streaming API for XML available in JavaScript)?

Benefit is that with the outer event loop, you have pretty good control over what output you want to generate, as well as no trouble with any variations in the XML format.

Is it too heavy, are required/desired features missing (as it's not XML-feature-complete), objecting to the license? But does the job, apart from the many possible improvements or other, potentially better solutions.

  • Related