Home > Blockchain >  Double iteration through chars in a string and nodes in XSLT - how to do it with recursion?
Double iteration through chars in a string and nodes in XSLT - how to do it with recursion?

Time:07-26

I need to replace some characters in a string stored in <p> with <app> nodes which contain a matching char (or substring) in a child element <lem>. Each <app> contains only one <lem> at the top, and an arbitrary number of other nodes below it. Each <app> only refers to a single character in the text, and they are placed in order.

I am new to XSLT, and cannot come up with a good recursion to do this -- I'm kind of stuck in the java or MATLAB mindset of iterating over i = 1:n and j= 1:m, and I understand that this is no good for taking advantage of recursion in XSLT... Thanks for your help!!!

<div>
            <p>SOMEWONDERFULOLDTEXT</p>
            <app>
               <lem>O</lem>
               <rdg>Ø</rdg>
            </app>
            <app>
               <lem>W</lem>
               <rdg>V</rdg>
            </app>
            <app>
               <lem>O</lem>
               <rdg>Ö</rdg>
            </app>
            <app>
               <lem>E</lem>
               <rdg>Ë</rdg>
               <rdg>ę</rdg>
            </app>
         </div>

My stylesheet so far is this, but I know it doesn't work because it is iterating through the text for every <app>, which is wrong.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs tei" version="3.0">

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

    <!-- now build the apparatus -->
    <xsl:template match="tei:div">
        <xsl:param name="thisBlock" select="./tei:p/node()"/>
        <xsl:for-each select="tei:app">
            <xsl:variable name="thisApp" select="."/>
            <xsl:for-each
                select="tokenize(replace(replace($thisBlock, '(.)', '$1\\n'), '\\n$', ''), '\\n')">
                <xsl:choose>
                    <xsl:when test="$thisApp/tei:lem/text() = .">
                    <xsl:copy-of select="$thisApp"></xsl:copy-of>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates></xsl:apply-templates>
                </xsl:otherwise>
                </xsl:choose>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

The result I want is the following, although I am getting a frightful mess with each <app> containing the variant readings of O printed for every single O in the text, regardless of order (of course, because I don't know how to iterate linearly along two "arrays")...

<div>
            <p>S<app>
               <lem>O</lem>
               <rdg>Ø</rdg>
            </app>ME<app>
               <lem>W</lem>
               <rdg>V</rdg>
            </app><app>
               <lem>O</lem>
               <rdg>Ö</rdg>
            </app>ND<app>
               <lem>E</lem>
               <rdg>Ë</rdg>
               <rdg>ę</rdg>
            </app>RFULOLDTEXT</p>
         </div>

CodePudding user response:

I don't think you need recursion, if I've understood what you're trying to do. Here's how I might attack the problem:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs tei" version="3.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0">

  <xsl:template match="p/text()">
    <!-- find the apps that applies to the current text node -->
    <xsl:variable name="apps" select="ancestor::div/app"/>
    
    <!-- parse the text node into a sequence of 1-char strings
      This weird trick uses string-to-codepoints() to tokenize 
      the string into a sequence of character codepoints, and
      then uses codepoints-to-string() to turn each integer 
      codepoint back into a 1-char string, yielding a sequence
      of characters 
    -->
    <xsl:variable name="characters" select="
      for $codepoint in 
        string-to-codepoints(.) 
      return 
        codepoints-to-string($codepoint)
    "/>

    <!-- for each character, output a matching app if
      there is one, or otherwise the character itself
    -->
    <xsl:for-each select="$characters">
      <xsl:variable name="character" select="."/>
      <xsl:variable name="app" select="$apps[lem = $character]"/>
      <xsl:choose>
        <xsl:when test="$app">
          <xsl:copy-of select="$app"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="."/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>

  <!-- discard app elements -->
  <xsl:template match="app"/>

  <!-- copy everything else -->
  <xsl:mode on-no-match="shallow-copy"/>

</xsl:stylesheet>

Applied to:

<div xmlns="http://www.tei-c.org/ns/1.0">
  <p>SOMEWONDERFULOLDTEXT</p>
  <app>
     <lem>O</lem>
     <rdg>Ø</rdg>
  </app>
  <app>
     <lem>W</lem>
     <rdg>V</rdg>
  </app>
  <app>
     <lem>O</lem>
     <rdg>Ö</rdg>
  </app>
  <app>
     <lem>E</lem>
     <rdg>Ë</rdg>
     <rdg>ę</rdg>
  </app>
</div>

Produces result:

<div xmlns="http://www.tei-c.org/ns/1.0">
  <p>S<app>
     <lem>O</lem>
     <rdg>Ø</rdg>
  </app><app>
     <lem>O</lem>
     <rdg>Ö</rdg>
  </app>M<app>
     <lem>E</lem>
     <rdg>Ë</rdg>
     <rdg>ę</rdg>
  </app><app>
     <lem>W</lem>
     <rdg>V</rdg>
  </app><app>
     <lem>O</lem>
     <rdg>Ø</rdg>
  </app><app>
     <lem>O</lem>
     <rdg>Ö</rdg>
  </app>ND<app>
     <lem>E</lem>
     <rdg>Ë</rdg>
     <rdg>ę</rdg>
  </app>RFUL<app>
     <lem>O</lem>
     <rdg>Ø</rdg>
  </app><app>
     <lem>O</lem>
     <rdg>Ö</rdg>
  </app>LDT<app>
     <lem>E</lem>
     <rdg>Ë</rdg>
     <rdg>ę</rdg>
  </app>XT</p>
  
  
  
  
</div>

CodePudding user response:

You could borrow from an XSLT 1.0 technique of Muenchian grouping to determine whether the occurrence of a character is the first in the string.

I would also use a key to link to the replacement app.

XSLT 2.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="char" match="char" use="." />
<xsl:key name="app" match="app" use="lem" />

<xsl:template match="div">
    <xsl:variable name="div" select="." />
    <xsl:variable name="chars">
        <xsl:analyze-string select="p" regex=".">
            <xsl:matching-substring>
                <char>
                    <xsl:value-of select="." />
                </char>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:variable>
    <!-- output -->
    <div>
        <xsl:for-each select="$chars/char">
            <xsl:variable name="app" select="key('app', ., $div)" />
            <xsl:choose>
                <xsl:when test="count(. | key('char', .)[1]) = 1 and $app">
                    <xsl:copy-of select="$app[1]"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="." />
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each>
    </div>
</xsl:template>

</xsl:stylesheet>

Applied to your input example, this will return:

Result

<?xml version="1.0" encoding="UTF-8"?>
<div>S<app>
      <lem>O</lem>
      <rdg>Ø</rdg>
   </app>M<app>
      <lem>E</lem>
      <rdg>Ë</rdg>
      <rdg>ę</rdg>
   </app>
   <app>
      <lem>W</lem>
      <rdg>V</rdg>
   </app>ONDERFULOLDTEXT</div>

This is slightly different from the result you show, but I suspect it is the correct one.

Note that by counting the $app variable you can easily detect multiple matches and handle them as you want.

  • Related