I need to replace some characters in a string stored in <p>
with <app>
nodes which contain a matching char (or substring) in a child element <lem>
. Each <app>
contains only one <lem>
at the top, and an arbitrary number of other nodes below it. Each <app>
only refers to a single character in the text, and they are placed in order.
I am new to XSLT, and cannot come up with a good recursion to do this -- I'm kind of stuck in the java or MATLAB mindset of iterating over i = 1:n
and j= 1:m
, and I understand that this is no good for taking advantage of recursion in XSLT... Thanks for your help!!!
<div>
<p>SOMEWONDERFULOLDTEXT</p>
<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app>
<app>
<lem>W</lem>
<rdg>V</rdg>
</app>
<app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>
<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>
</div>
My stylesheet so far is this, but I know it doesn't work because it is iterating through the text for every <app>
, which is wrong.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs tei" version="3.0">
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
<!-- now build the apparatus -->
<xsl:template match="tei:div">
<xsl:param name="thisBlock" select="./tei:p/node()"/>
<xsl:for-each select="tei:app">
<xsl:variable name="thisApp" select="."/>
<xsl:for-each
select="tokenize(replace(replace($thisBlock, '(.)', '$1\\n'), '\\n$', ''), '\\n')">
<xsl:choose>
<xsl:when test="$thisApp/tei:lem/text() = .">
<xsl:copy-of select="$thisApp"></xsl:copy-of>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates></xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The result I want is the following, although I am getting a frightful mess with each <app>
containing the variant readings of O printed for every single O in the text, regardless of order (of course, because I don't know how to iterate linearly along two "arrays")...
<div>
<p>S<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app>ME<app>
<lem>W</lem>
<rdg>V</rdg>
</app><app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>ND<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>RFULOLDTEXT</p>
</div>
CodePudding user response:
I don't think you need recursion, if I've understood what you're trying to do. Here's how I might attack the problem:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs tei" version="3.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:template match="p/text()">
<!-- find the apps that applies to the current text node -->
<xsl:variable name="apps" select="ancestor::div/app"/>
<!-- parse the text node into a sequence of 1-char strings
This weird trick uses string-to-codepoints() to tokenize
the string into a sequence of character codepoints, and
then uses codepoints-to-string() to turn each integer
codepoint back into a 1-char string, yielding a sequence
of characters
-->
<xsl:variable name="characters" select="
for $codepoint in
string-to-codepoints(.)
return
codepoints-to-string($codepoint)
"/>
<!-- for each character, output a matching app if
there is one, or otherwise the character itself
-->
<xsl:for-each select="$characters">
<xsl:variable name="character" select="."/>
<xsl:variable name="app" select="$apps[lem = $character]"/>
<xsl:choose>
<xsl:when test="$app">
<xsl:copy-of select="$app"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
<!-- discard app elements -->
<xsl:template match="app"/>
<!-- copy everything else -->
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
Applied to:
<div xmlns="http://www.tei-c.org/ns/1.0">
<p>SOMEWONDERFULOLDTEXT</p>
<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app>
<app>
<lem>W</lem>
<rdg>V</rdg>
</app>
<app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>
<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>
</div>
Produces result:
<div xmlns="http://www.tei-c.org/ns/1.0">
<p>S<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app><app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>M<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app><app>
<lem>W</lem>
<rdg>V</rdg>
</app><app>
<lem>O</lem>
<rdg>Ø</rdg>
</app><app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>ND<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>RFUL<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app><app>
<lem>O</lem>
<rdg>Ö</rdg>
</app>LDT<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>XT</p>
</div>
CodePudding user response:
You could borrow from an XSLT 1.0 technique of Muenchian grouping to determine whether the occurrence of a character is the first in the string.
I would also use a key to link to the replacement app
.
XSLT 2.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="char" match="char" use="." />
<xsl:key name="app" match="app" use="lem" />
<xsl:template match="div">
<xsl:variable name="div" select="." />
<xsl:variable name="chars">
<xsl:analyze-string select="p" regex=".">
<xsl:matching-substring>
<char>
<xsl:value-of select="." />
</char>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<!-- output -->
<div>
<xsl:for-each select="$chars/char">
<xsl:variable name="app" select="key('app', ., $div)" />
<xsl:choose>
<xsl:when test="count(. | key('char', .)[1]) = 1 and $app">
<xsl:copy-of select="$app[1]"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="." />
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</div>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, this will return:
Result
<?xml version="1.0" encoding="UTF-8"?>
<div>S<app>
<lem>O</lem>
<rdg>Ø</rdg>
</app>M<app>
<lem>E</lem>
<rdg>Ë</rdg>
<rdg>ę</rdg>
</app>
<app>
<lem>W</lem>
<rdg>V</rdg>
</app>ONDERFULOLDTEXT</div>
This is slightly different from the result you show, but I suspect it is the correct one.
Note that by counting the $app
variable you can easily detect multiple matches and handle them as you want.