Home > Software design >  How to preserve name of file in collection in xslt transformation?
How to preserve name of file in collection in xslt transformation?

Time:01-19

I have a collection of files I process in an XSLT transformation. The collection looks like this (not sure I can actually use a name attribute here):

<collection stable="true">
    <doc href="3690096.xml" name="3690096"/>
    <doc href="3690214.xml" name="3690214"/>
</collection>

In my transformation I use the collection in a variable <xsl:variable name="my_collection" select="collection('collection.xml')"/> and use $my_collection in a for-each loop to to create a html page for each xml file:

        <xsl:for-each select="$my_collection">
            <xsl:result-document href="{concat('item_', position(),'.html')}" method="html">
                <xsl:call-template name="separate_page_for_file"/>
            </xsl:result-document>
        </xsl:for-each>

As you can see above, I use position() and the results are item_1.html and item_2.html.

What I want to achieve is to preserve the original ID of the file. So the desired outcome is 3690096.html and 3690214.html.

A stretch goal is to have these IDs availale for other stuff too, because I have corresponding images with those IDs (like 3690214_0.jpeg, 3690214_1.jpeg and so on), that I could look up.

In general I can address the name via //doc/@name, but not when I am in context of the for-each loop for $my_collection.

CodePudding user response:

I would check whether e.g. <xsl:result-document href="{base-uri() => replace('\.xml$', '.html')}" method="html"> works.

CodePudding user response:

You should loop through $my_collection/collection/doc and then you can refer to @href and @name inside of the loop.

CodePudding user response:

There's a bit of history here. In the past Saxon's collection() function always returned documents that had a document-uri() property which usefully identified them. But then we found that didn't satisfy the rule that no two files are allowed to have the same document-uri(), so we changed that. You can follow some of the history starting at https://saxonica.plan.io/issues/5640

I think the simplest solution in your case might be to call uri-collection() in place of collection(). This gives you a set of URIs to work with, and you can retrieve the corresponding documents by calling doc().

Alternatively, as Martin suggests, if your collection is actually controlled by an XML catalog file, then you could just process the referenced files individually.

  • Related