We are using saxon-ee streaming to process big-file. In this case, the file size is around 1gb. the transformation is doing order lookup data and filtering the matching order_id.
The transformation takes about 1.5 hours. when I use the lookup/filtering.
If I comment out the lookup and if checks, it takes only 2 mins to transform the complete file.
It seems to be an issue with the way I am using a lookup and if condition. Please provide some suggestions to fix this performance issue. sample input XML
<?xml version="1.0" encoding="UTF-8"?>
<orders>
<order>
<guid>3079866431</guid>
<name>name1</name>
</order>
<order>
<guid>3079866431</guid>
<name>name2</name>
</order>
<order>
<guid>2583715475</guid>
<name>name3</name>
</order>
</orders>
lookup.xml file content
<?xml version="1.0"?><IndexControl><entry id="2521202370" status="true"/><entry id="2583715475" status="true"/></IndexControl>
XSLT template with lookups takes 1.5 hours
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:mode streamable="yes"/>
<xsl:variable name="IndexLookup" select="document('https://test.com/lookup.xml')/IndexControl"/>
<xsl:template match="orders">
<xsl:element name="Batch">
<xsl:for-each select="order ! copy-of(.)">
<xsl:variable name="order_id" select="guid"/>
<xsl:if test="$IndexLookup/entry[@id=$order_id]/@status = 'true'">
<xsl:element name="Order">
<xsl:element name="Field">
<xsl:attribute name="name">id</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="guid"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">name</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="name"/>
</xsl:attribute>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
XSLT without lookup takes 2 mins
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:mode streamable="yes"/>
<!-- <xsl:variable name="IndexLookup" select="document('https://test.com/lookup.xml')/IndexControl"/> -->
<xsl:template match="orders">
<xsl:element name="Batch">
<xsl:for-each select="order ! copy-of(.)">
<xsl:variable name="order_id" select="guid"/>
<!-- <xsl:if test="$IndexLookup/entry[@id=$order_id]/@status = 'true'"> -->
<xsl:element name="Order">
<xsl:element name="Field">
<xsl:attribute name="name">id</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="guid"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">name</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="name"/>
</xsl:attribute>
</xsl:element>
</xsl:element>
<!-- </xsl:if> -->
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
CodePudding user response:
Declare a key <xsl:key name="lookup" match="IndexControl/entry" use="@id"/>
and then use <xsl:for-each select="order ! copy-of(.)[key('lookup', guid, doc('https://test.com/lookup.xml'))/@status = 'true']">
.
CodePudding user response:
I would have expected the Saxon-EE optimiser to generate an index for the lookup expression, if I get a chance I will investigate why this is not happening. But certainly, using an explicit key as Martin Honnen suggests should fix it.
For streaming large files I usually reckon that around 1 minute per gigabyte is a reasonable target, but it obviously depends on the work you are doing and the machine you are running it on.
Incidentally, it won't affect performance, but I do find this kind of code very unreadable:
<xsl:element name="Field">
<xsl:attribute name="name">name</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="name"/>
</xsl:attribute>
</xsl:element>
when you could write instead:
<Field name="name" value="{name}"/>