I am trying to create variable and assign the values to an element, but the Saxon streaming is not working after this change. Please suggest me how to resolve this.
sample XML [indented for readability]
<?xml version="1.0" encoding="UTF-8"?>
<source>
<jobs>
<job>
<location>United States</location>
<title>Warehouse manager</title>
<city>Buford</city>
<state>GA</state>
<zip>30025</zip>
<country>United States</country>
<job_type/>
<posted_at>2022-03-08</posted_at>
<job_reference>123</job_reference>
<company>A</company>
<mobile_friendly_apply>No</mobile_friendly_apply>
<category/>
<html_jobs>Yes</html_jobs>
<url>https://google.com</url>
<body>test</body>
<cpa>1</cpa>
<cpc>2</cpc>
</job>
</jobs>
<generation_time>2022-03-08 18:34:07 -0500</generation_time>
<jobs_count>466</jobs_count>
</source>
XSLT code
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="source">
<xsl:variable name="feed_generation_time" select="generation_time"/>
<Batch>
<Header>
<Field name="EmailPref" value="EmailOnlyIfErrors"/>
<xsl:element name="Field">
<xsl:attribute name="name">feed_generation_time</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="$feed_generation_time"/>
</xsl:attribute>
</xsl:element>
</Header>
<xsl:for-each select="jobs/job ! copy-of(.)">
<xsl:variable name="feed_id" select="job_reference"/>
<Job>
<Field name="Action" value="Add"/>
<Field name="Country" value="US"/>
<xsl:element name="Field">
<xsl:attribute name="name">JobTitle</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="title"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">Description</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="body"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">ApplyURL</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="url"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">City</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="city"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">State</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="state"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">PostalCode</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="zip"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">ContactCompany</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="company"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">DiscreteField1</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="category"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">ExternalClientKey</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="job_reference"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">Cpc</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="cpc"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="Field">
<xsl:attribute name="name">Cpa</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="cpa"/>
</xsl:attribute>
</xsl:element>
<xsl:variable name="NormalizedEmployeeType" select="upper-case(title)"/>
</Job>
</xsl:for-each>
</Batch>
</xsl:template>
<xsl:mode streamable="yes"/>
</xsl:stylesheet>
Template rule is not streamable\n * Operand {generation_time} of {let $feed_generation_time := ...} selects streamed nodes in a context that allows arbitrary navigation (line 6)\n * The result of the template rule can contain streamed nodes.
CodePudding user response:
A transformation that copies data from somewhere near the end of the source document to somewhere near the start of the result document is intrinsically non-streamable.
Can you redesign the workflow so that generation_time
is generated at the start of the source document rather than at the end? Or even in a separate document?
If not, and assuming your real source actually does have so many jobs that you can't put them all in memory, I think that reading the source document twice is your only real option. It would involve changing your variable binding to something like this:
<xsl:variable name="feed_generation_time" as="xs:string">
<xsl:source-document streamable="yes" href="input.xml">
<xsl:sequence select="string(/descendant::generation_time[1]/text())"/>
</xsl:source-document>
</xsl:variable>
Martin's suggestion of using xsl:fork isn't really going to help. The effect of xsl:fork
is that instead of holding all the input in memory for the duration of the transformation, you hold all the output in memory instead. That can solve your problem in cases where the output is much smaller than the input, but that doesn't seem to be the case here.
CodePudding user response:
That is kind of a difficult format for streaming, the generation_time
is at the end of the source
element, after all the jobs/job
elements, as streaming works forwards only where you can of course throw in copy-of
calls to materialize/buffer nodes, I don't see much way doing it as you would need to use copy-of
on the source
element.
Or you can try your luck and rely on xsl:fork
e.g.
<xsl:fork>
<xsl:sequence>
<Header>
<Field name="EmailPref" value="EmailOnlyIfErrors"/>
<Field name="feed_generation_time" value="{generation_time}"/>
</Header>
</xsl:sequence>
<xsl:sequence>
<xsl:for-each select="jobs/job ! copy-of(.)">...</xsl:for-each>
</xsl:sequence>
</xsl:fork>
A different strategy would be to read the document twice, once with xsl:source-document
, just to stream through to collect that generation_time
at the (near) end of source
and keep that in a variable, the second time as the main input where you then process your jobs/job
as currently done and prefix them with the Header
and the feed_generation_time
from the variable.
If the generation_time
would preceed all the jobs/job
you could easily store it in an accumulator, but the only way to use an accumulator in your case would be to have it store all job
data in a light weight map and then, once your template processes the generation_time
, output that accumulator data as the XML you want. I don't know how huge your job
data is and how well that goes, in the end you could try whether your input jobs/job
kept in a map allow you to get by the memory problems I presume you currently have that you try to use XSLT 3 streaming.