Home > Software design >  How to turn <revst/> ... element(s) ... <revend/> into elements with attribute set (revs
How to turn <revst/> ... element(s) ... <revend/> into elements with attribute set (revs

Time:02-06

I want to turn this airplane manual SGML/XML input, which contains revst/revend tags:

<em>
  <revst/>
  <prclist>
    <title>This list of steps was added</title> 
  </prclist>
  <prclist>
    <title>Another list of steps was added</title> 
  </prclist>
  <revend/>

  <chapter>
    <WARNING>
      <revst/>
      <PARA>First changed paragraph showing revst at deeper depth.</PARA>
    </WARNING>
    <PARA>Second changed paragraph showing revst at deeper depth.</PARA>
    <revend/>
  </chapter>

  <listitem>
    <revst/>
    <PARA>First changed paragraph showing revst at higher depth</PARA>
    <NOTE>
      <PARA>Second changed paragraph showing revst at higher depth</PARA>
      <revend/>
    </NOTE>
  </listitem>

  <prclist>
    <title>This list of steps was unchanged</title> 
  </prclist>
    
  <para>
    Some text
    <revst/>and some changed text here.<revend/>
    This text didn't change.
  </para>
</em>

Into this:

<em>
  <prclist revised="1">
    <title revised="1">This list of steps was added</title> 
  </prclist>
  <prclist revised="1">
    <title revised="1">Another list of steps was added</title> 
  </prclist>

  <chapter>
    <WARNING>
      <PARA revised="1">First changed paragraph showing revst at deeper depth.</PARA>
    </WARNING>
    <PARA revised="1">Second changed paragraph showing revst at deeper depth.</PARA>
  </chapter>

  <listitem>
    <PARA revised="1">First changed paragraph showing revst at higher depth</PARA>
    <NOTE revised="1">
      <PARA revised="1">Second changed paragraph showing revst at higher depth</PARA>
    </NOTE>
  </listitem>

  <prclist>
    <title>This list of steps was unchanged</title> 
  </prclist>

  <para>
    Some text
    <span revised="1">and some changed text here.</span>
    This text didn't change.
  </para>
</em>

Reason: I believe setting "revised" attribute on all tags (in a first processing pass) will make it easier to do the final HTML conversion in a second pass. If it's not easy/clean to do this pass in xsl 3, I will just write a program to do it.

The final goal is to have a background color set in HTML for all "revised" elements/text.

Assume that revst/revend pairs can not overlap each other, in the input document.

CodePudding user response:

Wow, this came out much cleaner than I thought, using an accumulator.

This stylesheet:

<xsl:stylesheet version="3.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
  >

<xsl:mode use-accumulators="revisionCheck"/>

<xsl:template match="/">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="*">
  <xsl:variable name="revised" select="accumulator-before('revisionCheck')"/>
  <xsl:copy>
    <xsl:if test="$revised = 1">
      <!-- Add a "revised" attribute to this element -->
      <xsl:attribute name="revised" select="$revised"></xsl:attribute>
    </xsl:if>
    <xsl:apply-templates>
      <!-- Pass a parameter indicating if we are already inside a revised parent element.
           This is useful for eliminating redundant <spans> in text nodes. -->
      <xsl:with-param name="parent_revised" select="$revised"/>
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>

<!-- Remove these tags from the output. -->
<xsl:template match="revst | revend">
</xsl:template>

<!-- 
  Copy text.  If it is revised and NOT already in a revised parent element, wrap it in a span.
  -->
<xsl:template match="text()">
  <xsl:param name="parent_revised" />
  <xsl:variable name="revised" select="accumulator-before('revisionCheck')"/>
  <xsl:choose>
    <xsl:when test="$revised = 1 and $parent_revised != 1">
      <span revised="{$revised}"><xsl:value-of select="."/></span>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="."/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!-- 
Keep track of when we see revst and revend tags.
This seems to work even when revst is at a deeper level than the ending revend,
or vice-versa, yay! 
Note that it doesn't matter what phase we use (start or end), because the tags
don't contain any children.
-->
<xsl:accumulator name="revisionCheck" as="xs:integer" initial-value="-1" >
  <xsl:accumulator-rule match="revst" select="1"/>
  <xsl:accumulator-rule match="revend" select="0"/>
</xsl:accumulator>

</xsl:stylesheet>

Produces this output, just what I wanted:

<?xml version="1.0" encoding="UTF-8"?>
<em>
  <prclist revised="1">
    <title revised="1">This list of steps was added</title>
  </prclist>
  <prclist revised="1">
    <title revised="1">Another list of steps was added</title>
  </prclist>
  <chapter>
    <WARNING>
      <PARA revised="1">First changed paragraph showing revst at deeper depth.</PARA>
    </WARNING>
    <PARA revised="1">Second changed paragraph showing revst at deeper depth.</PARA>
  </chapter>
  <listitem>
    <PARA revised="1">First changed paragraph showing revst at lower depth</PARA>
    <NOTE revised="1">
      <PARA revised="1">Second changed paragraph showing revst at lower depth</PARA>
    </NOTE>
  </listitem>
  <prclist>
    <title>This list of steps was unchanged</title>
  </prclist>
  <para>
    Some text
    <span revised="1">and some changed text here.</span>
    This text didn't change.
  </para>
</em>

CodePudding user response:

To identify nodes "inside" of <revst/>..<revend/> you can use a nested for-each-group group-starting-with/group-ending-with; with XSLT 3 you can store groups in a variable as a sequence of arrays and push that variable as a tunnel parameter through a mode that checks if nodes are part of a group and add the attribute:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">
  
  <xsl:template match="/*">
    <xsl:variable name="rev-groups" as="array(node())*">
      <xsl:for-each-group select="descendant::node()" group-starting-with="revst">
        <xsl:if test="self::revst">
          <xsl:for-each-group select="tail(current-group())" group-ending-with="revend">
            <xsl:if test="current-group()[last()][self::revend]">
              <xsl:sequence select="array{ current-group()[position() lt last()] }"/>
            </xsl:if>
          </xsl:for-each-group>
        </xsl:if>
      </xsl:for-each-group>      
    </xsl:variable>
    <xsl:copy>
      <xsl:apply-templates select="@*, node()">
        <xsl:with-param name="rev-groups" tunnel="yes" select="$rev-groups"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:template match="text()[normalize-space()]">
    <xsl:param name="rev-groups" tunnel="yes"/>
    <xsl:choose>
      <xsl:when test=". intersect $rev-groups?1">
        <span revised="1">
          <xsl:next-match/>
        </span>
      </xsl:when>
      <xsl:otherwise>
        <xsl:next-match/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="*">
    <xsl:param name="rev-groups" tunnel="yes"/>
    <xsl:choose>
      <xsl:when test=". intersect $rev-groups?*">
        <xsl:copy>
          <xsl:attribute name="revised" select="1"/>
          <xsl:apply-templates select="@*, node()"/>
        </xsl:copy>
      </xsl:when>
      <xsl:otherwise>
        <xsl:next-match/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="revst | revend"/>

</xsl:stylesheet>

I think for elements to have the attribute revised added the code works well, the code to wrap other nodes into a span revised is probably not going to work as posted if comments or processing instructions occur as well. I am also not sure if that part of the requirement is clearly specified by the single example.

  • Related