I'm pretty new to using XSL/XSLT to perform XML transformations, and have a scenario I'm looking for some help with.
The TLDR; summation of the problem. I am working on a C# solution to escape certain characters in MathML, specifically in <mtext>
nodes. Characters include, but are not necessarily limited to {
, }
, [
, and ]
, where they would need to be updated to \{
, \}
, \[
, and \]
respectively. Seeing some of the interesting things people have done with XSLT transformation, I figured I would give that a shot.
For reference, here's a sample block of MathML:
<math style='font-family:Times New Roman' xmlns='http://www.w3.org/1998/Math/MathML'>
<mstyle mathsize='15px'>
<mrow>
<mtext>4 ___ {</mtext>
<mtext mathvariant='italic'>x</mtext>
<mtext>: </mtext>
<mtext mathvariant='italic'>x</mtext>
<mtext> is a natural number greater than 4}</mtext>
</mrow>
</mstyle>
</math>
Fiddling around, I have found that using this XSL, I can print out the contents of each <mtext>
element:
<?xml version='1.0' encoding=""UTF-8""?>
<xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
<xsl:output method=""xml"" indent=""yes""/>
<xsl:template match=""node()|@*"">
<xsl:copy>
<xsl:apply-templates select=""node()|@*"" />
</xsl:copy>
</xsl:template>
<xsl:template match=""/"">
<xsl:for-each select=""//*[local-name()='mtext']"">
<xsl:variable name=""myMTextVal"" select=""text()"" />
<xsl:message terminate=""no"">
<xsl:value-of select=""$myMTextVal""/>
</xsl:message>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
My first thought, which may seemed to quickly be an incorrect road to go down, was to use a translate()
in the for-each loop as an XSL 1.0 version of XSL 2.0's replace()
:
<!-- Outside of the looping template -->
<xsl:param name=""braceOpen"" select=""'{'"" />
<xsl:param name=""braceOpenReplace"" select=""'\{'"" />
<!-- In the loop itself -->
<xsl:value-of select=""translate(//*[local-name()='mtext']/text(), $braceOpen, $braceOpenReplace)""/>
The problem with using translate's limitation of a variation of 1:1 replacement quickly became apparent when the first mtext's content started to display as "4 ___ \" rather than "4 ___ \{".
So digging some more, I ran across these threads:
XSLT Replace function not found
both of which offered an alternative solution in lieu of replace()
. So I set up a test of:
<xsl:template name=""ProcessMathText"">
<xsl:param name=""text""/>
<xsl:param name=""replace""/>
<xsl:param name=""by""/>
<xsl:choose>
<xsl:when test=""contains($text,$replace)"">
<xsl:value-of select=""substring-before($text,$replace)""/>
<xsl:value-of select=""$by""/>
<xsl:call-template name=""ProcessMathText"">
<xsl:with-param name=""text"" select=""substring-after($text,$replace)""/>
<xsl:with-param name=""replace"" select=""$replace""/>
<xsl:with-param name=""by"" select=""$by""/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select=""$text""/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
and placed this within the for-each
block:
<xsl:otherwise>
<xsl:variable name=""mTextText"" select=""Text"" />
<xsl:call-template name=""ProcessMathText"">
<xsl:with-param name=""text"" select=""$mTextText""/>
<xsl:with-param name=""replace"" select=""'{'""/>
<xsl:with-param name=""by"" select=""'\{'""/>
</xsl:call-template>
</xsl:otherwise>
However, that began to throw "'xsl:otherwise' cannot be a child of the 'xsl:for-each' element." errors. Ultimately, I'm not 100% sure how to "invoke" the <xsl:otherwise>
content as stated in the links above without it being within the for-each
block, which I'm kind of wired to do based on my history with AS, JS, Python, and C#, so I was hoping someone might be able to help me out, or point me in a direction that might yield results rather than me just banging my head against a wall.
One other possible issue I have noticed on the output... It looks like the transformation results in losing the HTML entity characters such as  , and having them replaced with " ", which is something I do not want, as that could cause some annoying headaches down the line. Is there a way to maintain the structure, and only replace specific content, without accidentally replacing or in a sense "rendering" HTML entities?
Thanks in advance for your help!
CodePudding user response:
It is difficult to understand what your question actually is.
In order to escape "certain characters" in mtext
nodes, consider the following simplified example:
XML
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mstyle mathsize="15px">
<mrow>
<mtext>some text{with} all kinds of (brackets) in it</mtext>
<mtext>a different {example}</mtext>
<mtext>no change expected here</mtext>
<mtext>({tough one?})</mtext>
</mrow>
</mstyle>
</math>
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:math="http://www.w3.org/1998/Math/MathML">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="math:mtext">
<xsl:copy>
<xsl:call-template name="escape-chars">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
<xsl:template name="escape-chars">
<xsl:param name="string"/>
<xsl:param name="chars">(){}</xsl:param>
<xsl:choose>
<xsl:when test="$chars">
<xsl:variable name="char" select="substring($chars, 1, 1)" />
<xsl:choose>
<xsl:when test="contains($string, $char)">
<!-- process substring-before with the remaining chars -->
<xsl:call-template name="escape-chars">
<xsl:with-param name="string" select="substring-before($string, $char)"/>
<xsl:with-param name="chars" select="substring($chars, 2)"/>
</xsl:call-template>
<!-- escape matched char -->
<xsl:value-of select="concat('\', $char)"/>
<!-- continue with substring-after -->
<xsl:call-template name="escape-chars">
<xsl:with-param name="string" select="substring-after($string, $char)"/>
<xsl:with-param name="chars" select="$chars"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<!-- pass the entire string for processing with the remaining chars -->
<xsl:call-template name="escape-chars">
<xsl:with-param name="string" select="$string"/>
<xsl:with-param name="chars" select="substring($chars, 2)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$string"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Result
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mstyle mathsize="15px">
<mrow>
<mtext>some text\{with\} all kinds of \(brackets\) in it</mtext>
<mtext>a different \{example\}</mtext>
<mtext>no change expected here</mtext>
<mtext>\(\{tough one?\}\)</mtext>
</mrow>
</mstyle>
</math>
CodePudding user response:
In XSLT 3.0 it would look like this:
<xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:math="http://www.w3.org/1998/Math/MathML">
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="math:mtext">
<xsl:copy>
<xsl:value-of select=". => replace('[', '\[', 'q')
=> replace(']', '\]', 'q')
=> replace('{', '\{', 'q')
=> replace('}', '\}', 'q')"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>