Home > Net >  How to ignore all the script and style tags from parent in xslt
How to ignore all the script and style tags from parent in xslt

Time:11-21

I'm having a sample HTML on which I'm writing an XPath to extract content. And my main clause is to ignore style and script tags in it irrespective of the position and I want to do it from the parent itself. Here is my test block.

 <div itemprop="articleBody"> 
   <div>Main text.</div> 
   <p>
 <style type="text/css"> 
         #pStule{ 
         font-size: 10pt; 
         line-height: 15pt; 
         } 
    </style> 
sub text.</p> 
   <style type="text/css"> 
         #dhtmltooltip{ 
         font-size: 10pt; 
         line-height: 15pt; 
         } 
    </style> 
    <script> 
         var offsetxpoint=-60; 
         var offsetypoint=20;     
    </script> 
   <p>Another subtext.</p> 
</div> 

and my Xpath is

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:template match="/"> 
 <xsl:apply-templates select="descendant::div[@itemprop='articleBody']/descendant::*[not(descendant::style) and not(descendant::script) and not(self::style) and not(self::script)]
"/> 
    </xsl:template> 
</xsl:stylesheet>

I am aware that we can achieve this using an xsl:for-each and doing the stuff inside it. But my program only accepts 1 line of XPath, that's the reason I want to do it from the parent.

My current output is

Main text.Another subtext.

Expected output.

Main text.sub text.Another subtext.

Currently, my p is getting ignored as it has a style tag inside it. Please let me know how can I do this.

CodePudding user response:

To avoid getting text nodes inside of script or style elements you can use e.g.

//*[not(self::script | self::style)]/text()

CodePudding user response:

Because you haven't provided your desired output, it's not completely clear what you're trying achieve. My proposed XPath is based on the assumption you want to extract just the text value of the document, excluding the textual content of the script and style elements.

//text()[not(parent::script)][not(parent::style)]
  • Related