Home > Software design >  XSLT to extract text from mixed content?
XSLT to extract text from mixed content?

Time:11-21

Imagine I have the following XML:

<schools>
  <school>
    <description>Helpful description</description>
  </school>
  <school>
    <description>Another <important>really important</important> description</description>
  </school>
  <school>
    <description><important>WARNING</important> this is stupid</description>
  </school>
</schools>

I want to get the whole description element as text like

Helpful description
Another really important description
WARNING this is stupid

I'm able to get the text() before the first occurence of <important/>, but not after. I've tried something like this.

    <xsl:template match="description" mode="important_recursive">
        <xsl:value-of select="node()/preceding-sibling::text()"/>
        <xsl:apply-templates select="important" mode="important_recursive"/>
    </xsl:template>

    <xsl:template match="important" mode="important_recursive">
        <span><xsl:value-of select="text()"/></span>
    </xsl:template>

CodePudding user response:

The string-value of an element is the concatenation of all its descendant text nodes. To get the result you show (as text), you can do simply:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>

<xsl:template match="/schools">
    <xsl:for-each select="school">
        <xsl:value-of select="description"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

CodePudding user response:

As @MartinHonnen mentioned, an empty transformation,

<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
</xsl:stylesheet>

will output the text via the built-in templates:


  
    Helpful description
  
  
    Another really important description
  
  
    WARNING this is stupid
  

If you want to eliminate the spurious white space, see @michael.hor257's answer, or use xsl:strip-space with a simple template matching description to place line breaks where desired without a loop,

<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="description">
    <xsl:value-of select="."/>
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

to get the requested text output:

Helpful description
Another really important description
WARNING this is stupid
  • Related