Below is the HTML I'm working with. I've removed some lines that aren't relevant to this question, such as the content within the table.
My objective is to capture the names, and the corresponding information found within the table. Each Name/Table combination would be one row.
<div >
<div >
<span >
Contact Person
</span>
</div>
<div style="display: table-cell;">
Name A
<table>
</table>
Name B
<table>
</table>
</div>
</div>
I currently have this XPath '//div[@]/div/span[@][contains(text(),"Contact Person")]/ancestor::div/div[@]/table'
, which I am able to loop over to extract out the information in the table.
My issue is how to capture the name for each table, which I am finding difficult as they're both contained within the same tag.
I have tried using './ancestor::div[1]/text()'
, though this will capture both names.
Any help is greatly appreciated
CodePudding user response:
preceding-sibling::text()[1]
will return the text node prior to the context node. If the table
elements are used as the context node, that will return you the following text nodes:
Name A
and
Name B
NB I don't know what web scraping tool you are using, but I know that some of them have XPath APIs that won't return text nodes; only elements. If that's the case for you, you might need to switch to a different XPath API that is capable of returning text nodes, e.g. lxml https://lxml.de/xpathxslt.html#xpath