Home > Mobile >  XSLT streaming is too slow with if conditions
XSLT streaming is too slow with if conditions

Time:05-06

We are using saxon-ee streaming to process big-file. In this case, the file size is around 1gb. the transformation is doing order lookup data and filtering the matching order_id.

The transformation takes about 1.5 hours. when I use the lookup/filtering.

If I comment out the lookup and if checks, it takes only 2 mins to transform the complete file.

It seems to be an issue with the way I am using a lookup and if condition. Please provide some suggestions to fix this performance issue. sample input XML

<?xml version="1.0" encoding="UTF-8"?>
<orders>
   <order>
      <guid>3079866431</guid>
      <name>name1</name>
   </order>
   <order>
      <guid>3079866431</guid>
      <name>name2</name>
   </order>
   <order>
      <guid>2583715475</guid>
      <name>name3</name>
   </order>
</orders>

lookup.xml file content

<?xml version="1.0"?><IndexControl><entry id="2521202370" status="true"/><entry id="2583715475" status="true"/></IndexControl>

XSLT template with lookups takes 1.5 hours

<?xml version="1.0"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:mode streamable="yes"/>    
    <xsl:variable name="IndexLookup" select="document('https://test.com/lookup.xml')/IndexControl"/>
    <xsl:template match="orders">
        <xsl:element name="Batch">            
            <xsl:for-each select="order ! copy-of(.)">              
                 <xsl:variable name="order_id" select="guid"/>
              <xsl:if test="$IndexLookup/entry[@id=$order_id]/@status = 'true'">
                <xsl:element name="Order">
                    <xsl:element name="Field">
                        <xsl:attribute name="name">id</xsl:attribute>
                        <xsl:attribute name="value">
                            <xsl:value-of select="guid"/>
                        </xsl:attribute>
                    </xsl:element>
                    <xsl:element name="Field">
                        <xsl:attribute name="name">name</xsl:attribute>
                        <xsl:attribute name="value">
                            <xsl:value-of select="name"/>
                        </xsl:attribute>
                    </xsl:element>                  
                </xsl:element>               
                </xsl:if>
            </xsl:for-each>           
        </xsl:element>       
    </xsl:template>
</xsl:stylesheet>

XSLT without lookup takes 2 mins

<?xml version="1.0"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:mode streamable="yes"/>    
    <!-- <xsl:variable name="IndexLookup" select="document('https://test.com/lookup.xml')/IndexControl"/> -->
    <xsl:template match="orders">
        <xsl:element name="Batch">            
            <xsl:for-each select="order ! copy-of(.)">               
                <xsl:variable name="order_id" select="guid"/>
              <!-- <xsl:if test="$IndexLookup/entry[@id=$order_id]/@status = 'true'"> -->
                <xsl:element name="Order">
                    <xsl:element name="Field">
                        <xsl:attribute name="name">id</xsl:attribute>
                        <xsl:attribute name="value">
                            <xsl:value-of select="guid"/>
                        </xsl:attribute>
                    </xsl:element>
                    <xsl:element name="Field">
                        <xsl:attribute name="name">name</xsl:attribute>
                        <xsl:attribute name="value">
                            <xsl:value-of select="name"/>
                        </xsl:attribute>
                    </xsl:element>                  
                </xsl:element>               
               <!--  </xsl:if> -->
            </xsl:for-each>           
        </xsl:element>       
    </xsl:template>
</xsl:stylesheet>

CodePudding user response:

Declare a key <xsl:key name="lookup" match="IndexControl/entry" use="@id"/> and then use <xsl:for-each select="order ! copy-of(.)[key('lookup', guid, doc('https://test.com/lookup.xml'))/@status = 'true']">.

CodePudding user response:

I would have expected the Saxon-EE optimiser to generate an index for the lookup expression, if I get a chance I will investigate why this is not happening. But certainly, using an explicit key as Martin Honnen suggests should fix it.

For streaming large files I usually reckon that around 1 minute per gigabyte is a reasonable target, but it obviously depends on the work you are doing and the machine you are running it on.

Incidentally, it won't affect performance, but I do find this kind of code very unreadable:

 <xsl:element name="Field">
      <xsl:attribute name="name">name</xsl:attribute>
      <xsl:attribute name="value">
           <xsl:value-of select="name"/>
      </xsl:attribute>
 </xsl:element>

when you could write instead:

<Field name="name" value="{name}"/>
  • Related