Home > OS >  How to add nodes/elements as attribute to XML node using xslt
How to add nodes/elements as attribute to XML node using xslt

Time:09-29

I want to convert XHTML into XML as follows but I cannot figure out how to do it. I want to read the input div.cmp-text's data and add it to an attribute in a XML element.

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<result>
    <div >
        <strong xmlns="http://www.w3.org/1999/xhtml">Content</strong>
        <span xmlns="http://www.w3.org/1999/xhtml"
            >May 19, 2020
        </span>
        <h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2>
        <p xmlns="http://www.w3.org/1999/xhtml">
            Lorem ipsum dolor sit amet, consectetur adipisicing.
        </p>
    </div>
    
    <div >
        <hr xmlns="http://www.w3.org/1999/xhtml"/>
    </div>
    
    <div >
        <ul xmlns="http://www.w3.org/1999/xhtml">
            <li>
                Lorem ipsum.
            </li>
        </ul>
        <table xmlns="http://www.w3.org/1999/xhtml"
            style="border-collapse: collapse;"
            border="1">
            <tbody>
                <tr>
                    <td style="width: 33.3333%;">111</td>
                    <td style="width: 33.3333%;">212</td>
                </tr>
            </tbody>
        </table>
    </div>
    
    <div >
        <hr xmlns="http://www.w3.org/1999/xhtml"/>
    </div>
</result>

Expected output:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text
            type="/text"
            text="&lt;strong xmlns='http://www.w3.org/1999/xhtml'&gt;Content&lt;/strong&gt;&lt;span xmlns='http://www.w3.org/1999/xhtml' class='data-class'&gt;May 19, 2020&lt;/span&gt;&lt;h2 xmlns='http://www.w3.org/1999/xhtml'&gt;Description&lt;/h2&gt;&lt;p xmlns='http://www.w3.org/1999/xhtml'&gt;Lorem ipsum dolor sit amet, consectetur adipisicing.&lt;/p&gt;"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="&lt;ul xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;li&gt;Lorem ipsum.&lt;/li&gt;&lt;/ul&gt;&lt;table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'&gt;&lt;tbody>&lt;tr>&lt;td style='width: 33.3333%;'>111&lt;/td>&lt;td style='width: 33.3333%;'>212&lt;/td>&lt;/tr>&lt;/tbody>&lt;/table>"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>

XSL:

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0">

    <xsl:output version="1.0"
        encoding="UTF-8"
        indent="yes"
        method="xml"
        omit-xml-declaration="no"/>
    <xsl:strip-space elements="*"/>

    <!--root element-->
    <xsl:template match="/">
        <result>
            <xsl:apply-templates/>
        </result>
    </xsl:template>

    <!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
    <xsl:template match="/result/div[@class='cmp-text']">
        <text>
            <xsl:attribute name="type">/text</xsl:attribute>
            <xsl:attribute name="text">value</xsl:attribute>
            <xsl:attribute name="text2">
                <xsl:value-of select="node()"/>
            </xsl:attribute>
            <xsl:attribute name="text3">
                <xsl:value-of select=".//*"/>
            </xsl:attribute>
        </text>
    </xsl:template>

    <!--horizontal line-->
    <xsl:template match="/result/div[@class='cmp-horizontal-line']">
        <horizontal_line type="/horizontal-line"/>
    </xsl:template>

    <!--horizontal line-->
    <xsl:template match="/result/xhtml:div[@class='cmp-horizontal-line']">
        <horizontal_line type="/horizontal-line"/>
    </xsl:template>

    <!--identity template copies everything forward by default-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output XML using above XSL:

<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text type="/text"
            text="value"
            text2="Last Reviewed:"
            text3="Last Reviewed:"/>        
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="value"
            text2="Criteria"
            text3="Criteria"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>

In the text element, attributes text, text2 and text3 are my unsuccessful attempts to get the node(HTML) as is in the attribute.

How to get the desired output?

Update: Updated the desired output to well-formed XML.

The solution needs to be in XSLT 1.0 so can't use serialize().

After Martin's comment, I used the lenzconsulting.com/xml-to-string and was able to get the desired result by making following changes to the XSL script:

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">

    <xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>

    <xsl:template match="/result/div[@class='cmp-text']">
        <text>
            <xsl:attribute name="type">/text</xsl:attribute>
            <xsl:attribute name="text">
                <xsl:apply-templates select="./*" mode="xml-to-string"/>
            </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

which produced the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text
            type="/text"
            text="&lt;strong xmlns='http://www.w3.org/1999/xhtml'&gt;Content&lt;/strong&gt;&lt;span xmlns='http://www.w3.org/1999/xhtml' class='data-class'&gt;May 19, 2020&lt;/span&gt;&lt;h2 xmlns='http://www.w3.org/1999/xhtml'&gt;Description&lt;/h2&gt;&lt;p xmlns='http://www.w3.org/1999/xhtml'&gt;Lorem ipsum dolor sit amet, consectetur adipisicing.&lt;/p&gt;"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="&lt;ul xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;li&gt;Lorem ipsum.&lt;/li&gt;&lt;/ul&gt;&lt;table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'&gt;&lt;tbody>&lt;tr>&lt;td style='width: 33.3333%;'>111&lt;/td>&lt;td style='width: 33.3333%;'>212&lt;/td>&lt;/tr>&lt;/tbody>&lt;/table>"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>

CodePudding user response:

So your template for XSLT 3.0 would be e.g.

<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
    <text>
        <xsl:attribute name="type">/text</xsl:attribute>
        <xsl:attribute name="text" select="serialize(*)"/>
    </text>
</xsl:template>

which could be simplified to e.g.

<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
    <text type="/text" text="{serialize(*)}"/>
</xsl:template>

Output would then be more like e.g.

  <text type="/text"
        text="&lt;strong xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;Content&lt;/strong&gt;&lt;span xmlns=&#34;http://www.w3.org/1999/xhtml&#34; class=&#34;data-class&#34;&gt;May 19, 2020&#xA;        &lt;/span&gt;&lt;h2 xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;Description&lt;/h2&gt;&lt;p xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;&#xA;            Lorem ipsum dolor sit amet, consectetur adipisicing.&#xA;        &lt;/p&gt;"/>

If you really need to go the route the produce non-wellformed results then in XSLT 3 a character map can help e.g.

   <xsl:output version="1.0"
        encoding="UTF-8"
        indent="yes"
        method="xml"
        omit-xml-declaration="no" use-character-maps="m1"/>
    
    <xsl:character-map name="m1">
      <xsl:output-character character="&lt;" string="&lt;"/>
      <xsl:output-character character="&gt;" string=">"/>
      <xsl:output-character character="&quot;" string="&quot;"/>
    </xsl:character-map>

Saxon then produces output like e.g.

  <text type="/text"
        text='<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" >May 19, 2020&#xA;        </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">&#xA;            Lorem ipsum dolor sit amet, consectetur adipisicing.&#xA;        </p>'/>
  • Related