I want to convert XHTML into XML as follows but I cannot figure out how to do it. I want to read the input div.cmp-text's data and add it to an attribute in a XML element.
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<div >
<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong>
<span xmlns="http://www.w3.org/1999/xhtml"
>May 19, 2020
</span>
<h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2>
<p xmlns="http://www.w3.org/1999/xhtml">
Lorem ipsum dolor sit amet, consectetur adipisicing.
</p>
</div>
<div >
<hr xmlns="http://www.w3.org/1999/xhtml"/>
</div>
<div >
<ul xmlns="http://www.w3.org/1999/xhtml">
<li>
Lorem ipsum.
</li>
</ul>
<table xmlns="http://www.w3.org/1999/xhtml"
style="border-collapse: collapse;"
border="1">
<tbody>
<tr>
<td style="width: 33.3333%;">111</td>
<td style="width: 33.3333%;">212</td>
</tr>
</tbody>
</table>
</div>
<div >
<hr xmlns="http://www.w3.org/1999/xhtml"/>
</div>
</result>
Expected output:
<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text
type="/text"
text="<strong xmlns='http://www.w3.org/1999/xhtml'>Content</strong><span xmlns='http://www.w3.org/1999/xhtml' class='data-class'>May 19, 2020</span><h2 xmlns='http://www.w3.org/1999/xhtml'>Description</h2><p xmlns='http://www.w3.org/1999/xhtml'>Lorem ipsum dolor sit amet, consectetur adipisicing.</p>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="<ul xmlns='http://www.w3.org/1999/xhtml'><li>Lorem ipsum.</li></ul><table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'><tbody><tr><td style='width: 33.3333%;'>111</td><td style='width: 33.3333%;'>212</td></tr></tbody></table>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>
XSL:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0">
<xsl:output version="1.0"
encoding="UTF-8"
indent="yes"
method="xml"
omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<!--root element-->
<xsl:template match="/">
<result>
<xsl:apply-templates/>
</result>
</xsl:template>
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text">value</xsl:attribute>
<xsl:attribute name="text2">
<xsl:value-of select="node()"/>
</xsl:attribute>
<xsl:attribute name="text3">
<xsl:value-of select=".//*"/>
</xsl:attribute>
</text>
</xsl:template>
<!--horizontal line-->
<xsl:template match="/result/div[@class='cmp-horizontal-line']">
<horizontal_line type="/horizontal-line"/>
</xsl:template>
<!--horizontal line-->
<xsl:template match="/result/xhtml:div[@class='cmp-horizontal-line']">
<horizontal_line type="/horizontal-line"/>
</xsl:template>
<!--identity template copies everything forward by default-->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML using above XSL:
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text type="/text"
text="value"
text2="Last Reviewed:"
text3="Last Reviewed:"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="value"
text2="Criteria"
text3="Criteria"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>
In the text element, attributes text, text2 and text3 are my unsuccessful attempts to get the node(HTML) as is in the attribute.
How to get the desired output?
Update: Updated the desired output to well-formed XML.
The solution needs to be in XSLT 1.0 so can't use serialize().
After Martin's comment, I used the lenzconsulting.com/xml-to-string and was able to get the desired result by making following changes to the XSL script:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text">
<xsl:apply-templates select="./*" mode="xml-to-string"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
which produced the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text
type="/text"
text="<strong xmlns='http://www.w3.org/1999/xhtml'>Content</strong><span xmlns='http://www.w3.org/1999/xhtml' class='data-class'>May 19, 2020</span><h2 xmlns='http://www.w3.org/1999/xhtml'>Description</h2><p xmlns='http://www.w3.org/1999/xhtml'>Lorem ipsum dolor sit amet, consectetur adipisicing.</p>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="<ul xmlns='http://www.w3.org/1999/xhtml'><li>Lorem ipsum.</li></ul><table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'><tbody><tr><td style='width: 33.3333%;'>111</td><td style='width: 33.3333%;'>212</td></tr></tbody></table>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>
CodePudding user response:
So your template for XSLT 3.0 would be e.g.
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text" select="serialize(*)"/>
</text>
</xsl:template>
which could be simplified to e.g.
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text type="/text" text="{serialize(*)}"/>
</xsl:template>
Output would then be more like e.g.
<text type="/text"
text="<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" class="data-class">May 19, 2020
 </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">
 Lorem ipsum dolor sit amet, consectetur adipisicing.
 </p>"/>
If you really need to go the route the produce non-wellformed results then in XSLT 3 a character map can help e.g.
<xsl:output version="1.0"
encoding="UTF-8"
indent="yes"
method="xml"
omit-xml-declaration="no" use-character-maps="m1"/>
<xsl:character-map name="m1">
<xsl:output-character character="<" string="<"/>
<xsl:output-character character=">" string=">"/>
<xsl:output-character character=""" string="""/>
</xsl:character-map>
Saxon then produces output like e.g.
<text type="/text"
text='<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" >May 19, 2020
 </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">
 Lorem ipsum dolor sit amet, consectetur adipisicing.
 </p>'/>