I am new to XSLT and really hoping someone can help me out. I get an XML document from program and I need to be able to convert this to a CSV file with "cntl A" delimiter. I am able to convert to | delimiter file.
requirement is to convert to cntl a delimiter hence xslt should convert Unicode-\u0001 value of to cntl A to delimiter.
xml input file
<Content xmlns="http://www.taleo.com/ws/integration/toolkit/2005/07">
<ExportXML>
<record>
<field name="Number">12663342</field>
<field name="FileName">Document.pdf</field>
<field name="LastModificationDate">2022-07-17 16:31:29</field>
</record>
<record>
<field name="Number">12663324</field>
<field name="FileName">Rishabh's| Resume.pdf</field>
<field name="LastModificationDate">2022-07-17 06:38:44</field>
</record>
</ExportXML>
</Content>
xslt file
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:itk="http://www.taleo.com/ws/integration/toolkit/2005/07" xmlns:fct="http://www.taleo.com/xsl_functions" xmlns:quer="http://www.taleo.com/ws/integration/query">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:param name="csvDelimiter">"</xsl:param>
<xsl:param name="csvQuoteCharacter">"</xsl:param>
<!-- ======================================= -->
<!-- Root template. -->
<!-- ======================================= -->
<xsl:template match="/">
<!-- Process records. -->
<xsl:apply-templates select="//itk:record"/>
<!-- Build trailer record. -->
<xsl:text>TRL</xsl:text>
<xsl:value-of select="format-number(count(//itk:record), '000000000')"/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:function name="fct:nvl">
<xsl:param name="value"/>
<xsl:param name="replace-with"/>
<xsl:choose>
<xsl:when test="string-length($value) > 0">
<xsl:value-of select="$value"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$replace-with"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!-- ======================================= -->
<!-- Template matching each record. -->
<!-- ======================================= -->
<xsl:template match="itk:record">
<xsl:for-each select="itk:field">
<xsl:value-of select="fct:quote(.)"/>
<xsl:if test="position() != last()">
<xsl:value-of select="$csvDelimiter"/>
</xsl:if>
</xsl:for-each>
<xsl:text> </xsl:text>
</xsl:template>
<!-- ======================================= -->
<!-- Quote a value if it contains the csvDelimiter or the csvQuoteCharacter. -->
<!-- ======================================= -->
<xsl:function name="fct:quote">
<xsl:param name="value"/>
<xsl:choose>
<xsl:when test="contains($value, $csvDelimiter) or contains($value, $csvQuoteCharacter)">
<xsl:value-of select="$csvQuoteCharacter"/>
<xsl:value-of select="replace($value, $csvQuoteCharacter, concat($csvQuoteCharacter, $csvQuoteCharacter))"/>
<xsl:value-of select="$csvQuoteCharacter"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$value"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>
Present output file
12663342|Document.pdf|2022-07-17 16:31:29
12663324|"Rishabh's| Resume.pdf"|2022-07-17 06:38:44
TRL|2
I have never created csv file with cntl A delimiter .I just randomly search over internet for sample file here I found the detail
below contains ^A (Unicode-\u0001) as delimiter and every line is terminated by ^B for next line File :
10000^AA17^Aa17^A2423^B
10001^AA18^Aa18^A2423^B
10002^AA19^Aa19^A2423^B
10003^AA20^Aa20^A2423^B
CodePudding user response:
You will need to set your csvDelimiter
parameter to the Control-A value, e.g. like so:
<xsl:param name="csvDelimiter"></xsl:param>
But you will also need to add an XML declaration to the start of your stylesheet file, like so:
<?xml version="1.1"?>
The reason is that the Control-A character, 
, is not a valid character in XML version 1.0. In the most recent version of XML, namely version 1.1, this character (along with several other control characters) is allowed, but if your XSLT file lacks an XML declaration then it is, by default, an XML 1.0 file, i.e. an XML file with no XML declaration is effectively treated as if it had the declaration <?xml version="1.0">
.
CodePudding user response:
The essential problem here is that XSLT only handles the XML character set, even when generating CSV files which are not XML, and \u0001 is not available in the XML 1.0 character set - though it is available in XML 1.1.
If you can't use XML 1.1, one workaround would be to use a different character, and then post-process the output using a tool such as sed. You could even do this post-processing "inline" in a custom Stream supplied as the output destination of your XSLT processor (the details depend on your chosen XSLT processor and its API).