EDIT: I'm improving the original write-up of the issue (which was pretty bad to begin with; sorry folks!) and adding a bit of detail about what I've done since then.
My XSLT processor is Saxon 10.6 on Ubuntu, and I'm not limited to a specific XSLT version...1, 2, or 3 is OK.
I am starting with an XML file that looks similar to:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<responseDate>2021-01-01</responseDate>
<AllRecords>
<record>
<header>
<id>1111-2222-3333-4444</id>
<set>set_a</set>
</header>
<metadata>
<blah>
<value_1>val1_a</value_1>
<value_2>val2_a</value_2>
<value_3>val3_a</value_3>
</blah>
</metadata>
</record>
<record>
<header>
<id>AAAA-BBBB-CCCC-DDDD</id>
<set>set_b</set>
</header>
<metadata>
<blah>
<value_1>val1_b</value_1>
<value_2>val2_b</value_2>
<value_3>val3_b</value_3>
</blah>
</metadata>
</record>
</AllRecords>
</foo>
Namespaces have been removed for ease of reading.
My ultimate goal is to do the following for every record
in the set:
- Read the value of the
record
/header
/id
element for that record - Extract portions of the value read in step 1 based on delimiters in the metadata
- Create a new node under
record
/metadata
/blah
whose value includes metadata derived from therecord
/header
/id
field's value
So, the desired output might look like:
<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="http://www.loc.gov/mods/v3">
<responseDate>2021-01-01</responseDate>
<AllRecords>
<record>
<header>
<id>1111-2222-3333-4444</id>
<set>set_a</set>
</header>
<metadata>
<blah>
<value_1>val1_a</value_1>
<value_2>val2_a</value_2>
<value_3>val3_a</value_3>
<new-element>The value 1111 and 3333 came from the id element</new-element>
</blah>
</metadata>
</record>
<record>
<header>
<id>AAAA-BBBB-CCCC-DDDD</id>
<set>set_b</set>
</header>
<metadata>
<blah>
<value_1>val1_b</value_1>
<value_2>val2_b</value_2>
<value_3>val3_b</value_3>
<new-element>The value AAAA and CCCC came from the id element</new-element>
</blah>
</metadata>
</record>
</AllRecords>
</foo>
What is the best way to accomplish this? The documentation I've found all assumes that you're creating a child node rather than one that's multiple levels below the context node.
EDIT: After the initial write-up, I kept pounding on this and ended up with the following XSLT which probably looks like gawdawful rubbish to anyone who knows this stuff properly, and it definitely needs more work, but it gives an idea where I'm heading.
I may be heading to the destination through a swamp rather that via a logical path...that remains to be seen.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns="http://www.loc.gov/mods/v3">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="/foo">
<foo>
<responseDate><xsl:value-of select="responseDate"/></responseDate>
<AllRecords>
<xsl:apply-templates select="//record"/>
</AllRecords>
</foo>
</xsl:template>
<xsl:template match="//record">
<record>
<xsl:copy-of select="header" xmlns="http://www.loc.gov/mods/v3"/>
<metadata>
<blah>
<xsl:apply-templates select="metadata/blah"/>
</blah>
</metadata>
</record>
</xsl:template>
<xsl:template match="blah">
<xsl:copy-of select="*"/>
<xsl:variable name="id-full" select="../../header/id"/>
<xsl:variable name="id-first" select="substring-before($id-full, '-')"/>
<xsl:variable name="id-second" select="substring-before(substring-after($id-full, '-'), '-')"/>
<xsl:variable name="id-third" select="substring-before(substring-after(substring-after($id-full, '-'), '-'), '-')"/>
<xsl:variable name="id-fourth" select="substring-after(substring-after(substring-after($id-full, '-'), '-'), '-')"/>
<xsl:element name="new-element">The value <xsl:value-of select='$id-first'/> and <xsl:value-of select='$id-third'/> came from the id element</xsl:element>
</xsl:template>
</xsl:stylesheet>
When I run that stylesheet against my XML data, it gives me the following:
<foo xmlns="http://www.loc.gov/mods/v3">
<responseDate>2021-01-01</responseDate>
<AllRecords>
<record>
<header xmlns="">
<id>1111-2222-3333-4444</id>
<set>set_a</set>
</header>
<metadata>
<blah>
<value_1 xmlns="">val1_a</value_1>
<value_2 xmlns="">val2_a</value_2>
<value_3 xmlns="">val3_a</value_3>
<new-element>The value 1111 and 3333 came from the id element</new-element>
</blah>
</metadata>
</record>
<record>
<header xmlns="">
<id>AAAA-BBBB-CCCC-DDDD</id>
<set>set_b</set>
</header>
<metadata>
<blah>
<value_1 xmlns="">val1_b</value_1>
<value_2 xmlns="">val2_b</value_2>
<value_3 xmlns="">val3_b</value_3>
<new-element>The value AAAA and CCCC came from the id element</new-element>
</blah>
</metadata>
</record>
</AllRecords>
</foo>
I still need to dig further into the null namespaces resulting from the copies, but this is largely accomplishing what I want. It's probably doing it in a horribly inefficient way, so I'll look at Martin's suggestion (from before I cleaned up this description) and I'll be glad to hear more suggestions!
CodePudding user response:
Write a template for the blah
element and add the new-element
, you can select the id
element with XPath:
<xsl:template match="record/metadata/blah">
<xsl:copy>
<xsl:apply-templates/>
<new-element>
<xsl:value-of select="ancestor::record/header/id"/>
</new-element>
</xsl:copy>
</xsl:template>
You haven't show which values the blah
child elements can have nor how they are supposed to be used to infer/extract contents from the id
element, so I have restricted the answer to just showing how to select and output the complete id
element value where constructing the new element. You can of course pass that value to any function or template that splits or tokenizes based on the not shown value element values.
The identity transformation (e.g. declared in XSLT 3 with <xsl:mode on-no-match="shallow-copy"/>
or spelled out as a template in XSLT 1 or 2) is assumed as the base for the above solution.
CodePudding user response:
I still don't see any "delimiters in the metadata". Judging by your code and the expected result, you simply want to extract the first and the third token from a string delimited by the -
character. This is easy to do in XSLT 2.0 or higher using the tokenize()
function:
XSLT 3.0
<xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
expand-text="yes">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="blah">
<xsl:copy>
<xsl:apply-templates/>
<xsl:variable name="tokens" select="tokenize(../../header/id, '-')" />
<new-element>The value {$tokens[1]} and {$tokens[3]} came from the id element</new-element>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, this will return:
Result
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<responseDate>2021-01-01</responseDate>
<AllRecords>
<record>
<header>
<id>1111-2222-3333-4444</id>
<set>set_a</set>
</header>
<metadata>
<blah>
<value_1>val1_a</value_1>
<value_2>val2_a</value_2>
<value_3>val3_a</value_3>
<new-element>The value 1111 and 3333 came from the id element</new-element>
</blah>
</metadata>
</record>
<record>
<header>
<id>AAAA-BBBB-CCCC-DDDD</id>
<set>set_b</set>
</header>
<metadata>
<blah>
<value_1>val1_b</value_1>
<value_2>val2_b</value_2>
<value_3>val3_b</value_3>
<new-element>The value AAAA and CCCC came from the id element</new-element>
</blah>
</metadata>
</record>
</AllRecords>
</foo>
Do note that this result is produced by copying all nodes from the input tree to the output and adding a new element. Therefore the output elements are in no-namespace, same as the input. If you want the output elements to be in a different namespace, then you must recreate them in their target namespace instead of copying.