Home > OS >  How to keep XSLT from introducing whitespaces in HTML output
How to keep XSLT from introducing whitespaces in HTML output

Time:03-22

I am generating HTML from XML sources using XSLT. The HTML shows a lot of whitespace that was not in the original XML files. Normally this is not a problem as the browser will ignore the extra whitespace characters. But I am developing an application that relies on correct positioning of the text cursor inside the HTML page. The added whitespaces do mess up the offsets, making it impossible to reliably position the cursor inside an element.

My question: how can I get my XSLT to not introduce any additional whitespaces in text nodes? I am using <xsl:strip-space elements="*"/> but that does not keep the processor from introducing lots of whitespace. It looks like some pretty-printing processing is applied to the HTML and I have no idea where this comes from. I am currently using Saxon PE 9.9.1.7

[Edit]

I created a simple example that shows the same strange behaviour. First the XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p>This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT. This manual describes the spacecraft, safety aspects, usage and maintenance procedures. Make sure the manual is available to anyone who will be using the product.</p>
</root>

Here is the simplified XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0">

    <xsl:output method="html" encoding="UTF-8"/>

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="root">
        <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE html&gt;&#xD;</xsl:text>
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates select="*"/>
                <script src="cursor.js"></script>
            </body>
        </html>
    </xsl:template> 

    <xsl:template match="p">
        <p contenteditable="true" id="p1" onclick="show_position()">
            <xsl:value-of select="."/>
        </p>
    </xsl:template>

</xsl:stylesheet>

The JavaScript file to show the current cursor position:

function show_position( )
{
    alert('position: '   document.getSelection().anchorOffset );
}

The HTML that is generated by the XSLT looks like this (shown in oXygen):

<!DOCTYPE html>
<html>
   <head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
       <title>Test</title>
   </head>
   <body>
      <p contenteditable="true" id="p1" onclick="show_position()">This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT.
         This manual describes the spacecraft, safety aspects, usage and maintenance procedures.
         Make sure the manual is available to anyone who will be using the product.</p><script src="cursor.js"></script></body>
</html>

Viewing the HTML in a browser makes all the extra whitespaces collapse into a single space, as expected. Clicking inside the paragraph shows the current offset from the start of the paragraph. Clicking immediately before 'This manual' shows position 86. Clicking one character to the right shows position 96. The same extra whitespace is introduced in the sentence starting with 'Make sure'.

I tried with Chrome and Safari - both show identical results. It does not seem to be a browser problem, but an issue with HTML generation by the XSLT processor. I have tried other Saxon versions but the resulting HTML is always the same.

Any further info on how to prevent these extra whitespace characters in my HTML output would be highly appreciated.

CodePudding user response:

The default for output method="html" is indent="yes", I think, so you could certainly explicitly set indent="no" on your xsl:output declaration.

Additionally, as you say you use Saxon PE 9.9, you have access to XSLT 3 features like suppress-indentation="p" and/or Saxon PE/EE specific settings to use a very high setting for the normal line length, check the documentation for e.g. saxon:line-length or similar.

  • Related