Home > Software engineering >  Using XSL to check contents of specific nodes, and alter when condition is met
Using XSL to check contents of specific nodes, and alter when condition is met

Time:06-23

I'm pretty new to using XSL/XSLT to perform XML transformations, and have a scenario I'm looking for some help with.

The TLDR; summation of the problem. I am working on a C# solution to escape certain characters in MathML, specifically in <mtext> nodes. Characters include, but are not necessarily limited to {, }, [, and ], where they would need to be updated to \{, \}, \[, and \] respectively. Seeing some of the interesting things people have done with XSLT transformation, I figured I would give that a shot.

For reference, here's a sample block of MathML:

<math style='font-family:Times New Roman' xmlns='http://www.w3.org/1998/Math/MathML'>
    <mstyle mathsize='15px'>
        <mrow>
            <mtext>4&#160;___&#160;{</mtext>
            <mtext mathvariant='italic'>x</mtext>
            <mtext>:&#160;</mtext>
            <mtext mathvariant='italic'>x</mtext>
            <mtext>&#160;is&#160;a&#160;natural&#160;number&#160;greater&#160;than&#160;4}</mtext>
        </mrow>
    </mstyle>
</math>

Fiddling around, I have found that using this XSL, I can print out the contents of each <mtext> element:

<?xml version='1.0' encoding=""UTF-8""?>
<xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
    <xsl:output method=""xml"" indent=""yes""/>

    <xsl:template match=""node()|@*"">
        <xsl:copy>
            <xsl:apply-templates select=""node()|@*"" />
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match=""/"">
        <xsl:for-each select=""//*[local-name()='mtext']"">
            <xsl:variable name=""myMTextVal"" select=""text()"" />
            <xsl:message terminate=""no"">
                <xsl:value-of select=""$myMTextVal""/>
            </xsl:message>
        </xsl:for-each>
    </xsl:template>
    
</xsl:stylesheet>

My first thought, which may seemed to quickly be an incorrect road to go down, was to use a translate() in the for-each loop as an XSL 1.0 version of XSL 2.0's replace():

<!-- Outside of the looping template -->
<xsl:param name=""braceOpen"" select=""'{'"" />
<xsl:param name=""braceOpenReplace"" select=""'\{'"" />

<!-- In the loop itself -->
<xsl:value-of select=""translate(//*[local-name()='mtext']/text(), $braceOpen, $braceOpenReplace)""/>

The problem with using translate's limitation of a variation of 1:1 replacement quickly became apparent when the first mtext's content started to display as "4 ___ \" rather than "4 ___ \{".

So digging some more, I ran across these threads:

XSLT string replace

XSLT Replace function not found

both of which offered an alternative solution in lieu of replace(). So I set up a test of:

<xsl:template name=""ProcessMathText"">
  <xsl:param name=""text""/>
  <xsl:param name=""replace""/>
  <xsl:param name=""by""/>
    <xsl:choose>
        <xsl:when test=""contains($text,$replace)"">
            <xsl:value-of select=""substring-before($text,$replace)""/>
            <xsl:value-of select=""$by""/>
            <xsl:call-template name=""ProcessMathText"">
                <xsl:with-param name=""text"" select=""substring-after($text,$replace)""/>
                <xsl:with-param name=""replace"" select=""$replace""/>
                <xsl:with-param name=""by"" select=""$by""/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select=""$text""/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

and placed this within the for-each block:

<xsl:otherwise>
    <xsl:variable name=""mTextText"" select=""Text"" />
    <xsl:call-template name=""ProcessMathText"">
        <xsl:with-param name=""text"" select=""$mTextText""/>
        <xsl:with-param name=""replace"" select=""'{'""/>
        <xsl:with-param name=""by"" select=""'\{'""/>
    </xsl:call-template>
</xsl:otherwise>

However, that began to throw "'xsl:otherwise' cannot be a child of the 'xsl:for-each' element." errors. Ultimately, I'm not 100% sure how to "invoke" the <xsl:otherwise> content as stated in the links above without it being within the for-each block, which I'm kind of wired to do based on my history with AS, JS, Python, and C#, so I was hoping someone might be able to help me out, or point me in a direction that might yield results rather than me just banging my head against a wall.

One other possible issue I have noticed on the output... It looks like the transformation results in losing the HTML entity characters such as &#160;, and having them replaced with " ", which is something I do not want, as that could cause some annoying headaches down the line. Is there a way to maintain the structure, and only replace specific content, without accidentally replacing or in a sense "rendering" HTML entities?

Thanks in advance for your help!

CodePudding user response:

It is difficult to understand what your question actually is.

In order to escape "certain characters" in mtext nodes, consider the following simplified example:

XML

<math xmlns="http://www.w3.org/1998/Math/MathML">
    <mstyle mathsize="15px">
        <mrow>
            <mtext>some text{with} all kinds of (brackets) in it</mtext>
            <mtext>a different {example}</mtext>
            <mtext>no change expected here</mtext>
            <mtext>({tough one?})</mtext>
        </mrow>
    </mstyle>
</math>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:math="http://www.w3.org/1998/Math/MathML">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="math:mtext">
    <xsl:copy>
        <xsl:call-template name="escape-chars">
            <xsl:with-param name="string" select="."/>
        </xsl:call-template>
    </xsl:copy>
</xsl:template>

<xsl:template name="escape-chars">
    <xsl:param name="string"/>
    <xsl:param name="chars">(){}</xsl:param>
    <xsl:choose>
        <xsl:when test="$chars">
            <xsl:variable name="char" select="substring($chars, 1, 1)" />
            <xsl:choose>
                <xsl:when test="contains($string, $char)">
                    <!-- process substring-before with the remaining chars -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="substring-before($string, $char)"/>
                        <xsl:with-param name="chars" select="substring($chars, 2)"/>
                    </xsl:call-template>
                    <!-- escape matched char -->
                    <xsl:value-of select="concat('\', $char)"/>
                    <!-- continue with substring-after -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="substring-after($string, $char)"/>
                        <xsl:with-param name="chars" select="$chars"/>
                    </xsl:call-template>
                </xsl:when>
                <xsl:otherwise>
                    <!-- pass the entire string for processing with the remaining chars -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="$string"/>
                        <xsl:with-param name="chars" select="substring($chars, 2)"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$string"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
   <mstyle mathsize="15px">
      <mrow>
         <mtext>some text\{with\} all kinds of \(brackets\) in it</mtext>
         <mtext>a different \{example\}</mtext>
         <mtext>no change expected here</mtext>
         <mtext>\(\{tough one?\}\)</mtext>
      </mrow>
   </mstyle>
</math>

CodePudding user response:

In XSLT 3.0 it would look like this:

<xsl:stylesheet version="3.0"  
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:math="http://www.w3.org/1998/Math/MathML">

    <xsl:output method="xml" indent="yes"/>
    <xsl:mode on-no-match="shallow-copy"/>
    
    <xsl:template match="math:mtext">
      <xsl:copy>
        <xsl:value-of select=". => replace('[', '\[', 'q')
                                => replace(']', '\]', 'q')
                                => replace('{', '\{', 'q')
                                => replace('}', '\}', 'q')"/>
      </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>
  • Related