I am try to transform from XML (UTF-8 encoding) to CSV (win-1251 encoding) - I get an error
net.sf.saxon.trans.DynamicError: Output character not available in this encoding (decimal 160)
I understand that in the xml text there is a character with code 160 which is not in win-1251.
Tried to clear XML before transformation process, but it doesn't help
Charset charset = Charset.forName("windows-1251");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();
encoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
String result = s;
try {
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(s));
CharBuffer cbuf = decoder.decode(bbuf);
result = cbuf.toString();
} catch (CharacterCodingException cce) {
log.error("Exception during character encoding/decoding: " cce.getMessage());
}
Please tell me the best way to solve this problem?
my xsl sample
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE csv-style [
<!ENTITY semicolons ';;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;'>
<!ENTITY commas ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'>
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:output method="text" indent="no" omit-xml-declaration="yes" encoding="windows-1251"/>
<xsl:param name="delim">semicolon</xsl:param>
<xsl:param name="showHead">yes</xsl:param>
<xsl:variable name="delimStr">
<xsl:choose>
<xsl:when test="$delim = 'comma'">&commas;</xsl:when>
<xsl:otherwise>&semicolons;</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:template match="blocks">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="description|pair|foot|body/table/head">
<!-- don't do anything just skip it-->
</xsl:template>
<xsl:template match="table">
<xsl:apply-templates select="table|head|body"/>
</xsl:template>
<xsl:template match="col">
<xsl:if test="position()=1">
<xsl:value-of select="substring($delimStr, 1, @id - 1)"/>
</xsl:if>
<xsl:choose>
<xsl:when test="@value">
<xsl:text>"</xsl:text><xsl:variable name="escape">
<xsl:call-template name="_replace_string">
<xsl:with-param name="string" select="@value" />
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$escape" /><xsl:text>"</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>""</xsl:text>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
<xsl:choose>
<xsl:when test="position()=last()">
<xsl:value-of select="substring($delimStr, 1, ancestor::table[1]/@colNum - @id)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring($delimStr, 1, following-sibling::col[1]/@id - @id)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template> <!-- col -->
<xsl:template match="row">
<xsl:if test="col[@value][1]">
<xsl:apply-templates select="col"/>
<xsl:text> </xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="head">
<xsl:if test="$showHead = 'yes'">
<xsl:apply-templates select="*"/>
</xsl:if>
</xsl:template>
<xsl:template match="body">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template name="_replace_string">
<xsl:param name="string" select="''"/>
<xsl:variable name="find">"</xsl:variable>
<xsl:variable name="replace">""</xsl:variable>
<xsl:choose>
<xsl:when test="contains($string,$find)">
<xsl:value-of select="concat(substring-before($string,$find),$replace)"/>
<xsl:call-template name="_replace_string">
<xsl:with-param name="string" select="substring-after($string,$find)"/>
<xsl:with-param name="find" select="$find"/>
<xsl:with-param name="replace" select="$replace"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$string"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
my xml sample
<?xml version="1.0" encoding="UTF-8" ?><blocks type="report"><functions><func num="4" text=" nameOf_10031"></func><func num="5" text="name Of_10071"></func><func num="6" text="name Of_10006"></func></functions><description name="[441] testesttest with 160 "><rows total="44" start="1" end="44" show-data="yes"></rows><columns count="10"><column id="1" type="4" position="1" width="" format="'dd.mm.yyyy'"></column><column id="2" type="4" position="2" width="" format="'dd.mm.yyyy'"></column><column id="3" type="3" position="3" width=""></column><column id="4" type="2" position="4" width=""></column><column id="5" type="2" position="5" width=""></column><column id="6" type="2" position="6" width=""></column><column id="7" type="2" position="7" width=""></column><column id="8" type="2" position="8" width=""></column><column id="9" type="2" position="9" width=""></column><column id="10" type="2" position="10" width=""></column></columns></description><pair name="ReportName" value="test test test "></pair><table colNum="10" id="12561"><head><row><col id="1" value="test test test"></col><col id="2" value=" test test test"></col><col id="3" value="test test test"></col><col id="4" value="test test test"></col><col id="5" value="test test test"></col><col id="6" value="test test test"></col><col id="7" value="test test test"></col><col id="8" value=" test test test"></col><col id="9" value="test test test"></col><col id="10" value="test test test"></col></row></head><body><row num="1"><col id="1" value="01.07.2006"></col><col id="2"></col><col id="3" value="53363"></col><col id="4" value="65187" record-id="65187"></col><col id="5" value="53363" record-id="53368"></col><col id="6" value="test test test" record-id="1974"></col><col id="7"></col><col id="8"></col><col id="9" value="test test test"></col><col id="10"></col></row></body></table></blocks>
when i try
java -cp saxon-9.1.0.8.jar net.sf.saxon.Transform -t -s:myxml.xml -xsl:myxsl.xsl -o:result.csv
i get an same error (160)
Saxon 9.1.0.8J from Saxonica
Java version 1.8.0_333
Warning: at xsl:stylesheet on line 11 column 81 of myxsl.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 378 milliseconds
Processing file:/D:/111/myxml2.xml
Building tree for file:/D:/111/myxml2.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 4 milliseconds
Tree size: 46 nodes, 0 characters, 99 attributes
Loading net.sf.saxon.event.MessageEmitter
Error at xsl:value-of on line 46 of myxsl.xsl:
Output character not available in this encoding (decimal 160)
at xsl:apply-templates (file:/D:/111/myxsl.xsl#66)
processing /blocks/table[1]/head[1]/row[1]/col[2]
at xsl:apply-templates (file:/D:/111/myxsl.xsl#73)
processing /blocks/table[1]/head[1]/row[1]
at xsl:apply-templates (file:/D:/111/myxsl.xsl#32)
processing /blocks/table[1]/head[1]
at xsl:apply-templates (file:/D:/111/myxsl.xsl#24)
processing /blocks/table[1]
in built-in template rule
Transformation failed: Run-time errors were reported
When I use a newer version, for example Saxon-HE-10.3.jar, there are no problems, but unfortunately I can't upgrade to it
CodePudding user response:
You are using a very old (and unsupported) version of Saxon. In Saxon 9.1 (released in 2009) the software maintained its own data tables for character encoding, rather than getting it all from the JDK. According to the definition of CP1251 used in the Saxon 9.1 data tables, there is no mapping for the Unicode codepoint 160. The relevant source code contains a link to the URI http://www.microsoft.com/globaldev/reference/sbcs/1251.htm as its source of information, but that web page is no longer available.
Sorry we can't help you more, but this kind of thing happens if you don't upgrade your software from time to time.
Your best way forward is probably to output the data in UTF-8 encoding and then use some other utility to convert the CSV file from UTF-8 to CP1251.
CodePudding user response:
A character map mapping e.g the non-breaking space 160 to a normal space 32 would be
<xsl:character-map name="m1">
<xsl:output-character character=" " string=" "/>
</xsl:character-map>
<xsl:output use-character-maps="m1"/>
Character maps are supported since XSLT 2 and Saxon 8.9 I think was the first version to implement the 2.0 standard so 9.1 should cover that.