I want to extract the text content from the below HTML tag, but the <sup>
tag is preventing me from getting the desired text.
The text I want to extract is simply (4:6, 6:7)
. how can I extract this text at the same time escaping the <sup>
tag.
I tried this "//p/text()"
, but I am only getting the part before the <sup>
tag (4:6, 6
my html tag
'<p ><span >Final result </span><strong>0:2</strong> (4:6, 6<sup>5</sup>:7)</p>
CodePudding user response:
It's the only text that is a direct text of p
, the rest are texts inside a child tag.
scrapy shell file:///path/to/file.html
In [1]: ''.join(response.xpath('//p[@]/text()').getall())
Out[1]: ' (4:6, 6:7)'
CodePudding user response:
Try :
('//*[@]//following-sibling::sup/./..//text()').getall()