Home > Enterprise >  XPATH to use preceding and following sibling in a single statement
XPATH to use preceding and following sibling in a single statement

Time:07-24

I would like to scrape name, address informations between tag contains defendent text and another tag,

My HTML structure is:

<hr>
<H5>Defendant/Respondent Information</H5>
<span >(Each Defendant/Respondent is displayed below)</span>
<table>
<tr>
<td><span >Party Type:</span></td><td><span >Defendant</span><span >Party No.:</span><span >1</span></td>
</tr>
</table>
<table>
<tr>
<td><span >Name:</span></td><td><span >Burrell, Marvin</span></td>
</tr>
</table>
<table>
<tr>
<td><span >Address:</span></td><td><span >33 N Ann St</span></td>
</tr>
<tr>
<td><span >City:</span></td><td><span >Baltimore</span><span >State:</span><span >MD</span><span >Zip Code:</span><span >21231</span></td>
</tr>
</table>
<hr>
<table>
<tr>
<td><span >Party Type:</span></td><td><span >Defendant</span><span >Party No.:</span><span >2</span></td>
</tr>
</table>
<table>
<tr>
<td><span >Name:</span></td><td><span >Burrell, Frances  Ann</span></td>
</tr>
</table>
<table>
<tr>
<td><span >Address:</span></td><td><span >33 N Ann St</span></td>
</tr>
<tr>
<td><span >City:</span></td><td><span >Baltimore</span><span >State:</span><span >MD</span><span >Zip Code:</span><span >21231</span></td>
</tr>
</table>
<hr>
<H5>Related Persons Information</H5>
<span >(Each Related person is displayed below)</span>
<table>
<tr>
<td><span >Name:</span></td><td><span >Unwanted Name</span></td>
</tr>
</table>
<table>
<tr>
<td><span >Address:</span></td><td><span >33 N Ann St</span></td>
</tr>
<tr>
<td><span >City:</span></td><td><span >Unwanted City</span><span >State:</span><span >Unwanted city</span><span >Zip Code:</span><span >12345</span></td>
</tr>
</table>
<table></table>
<hr>

My current XPATH capturing the first occurence of Name and address properly, but if need to extract the multiple occurences, it also scrape the information from the unwanted h5 tags.

My current XPATH is,

"//*[contains(text(),'Defendant')]//following-sibling::table//span[text()='Name:' or text()='Business or Organization Name:']/ancestor-or-self::td/following-sibling::td//text()")

I tried including preceding sibling and following sibling but nothing gives my expected output,

My current output is..

names - [
Burrell, Marvin,
Burrell, Frances  Ann,
Unwanted Name,
]

Expected output is,

[
Burrell, Marvin,
Burrell, Frances  Ann,

]

Kindly help.

CodePudding user response:

try this:

"//H5[contains(text(),'Defendant')]/following-sibling::table[not(preceding-sibling::H5[not(contains(text(),'Defendant'))])]/tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"

It first selects the table that has not a preceding-sibling::h5 with text() that not contains 'Defendant' and than selects from the correct table the tr where the first td meets your requirements and selects the second td

No need for double slashes which is bad for performance

EDIT

Since there are more preceding-sibling::h5 than the example shows, this XPath will deal with that:

"//H5[contains(text(),'Defendant')]/following-sibling::table[preceding-sibling::H5[1][contains(text(),'Defendant')]]//tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"

This will only select those tables that have as there first preceding-sibling::h5 the same h5 as we were interested in

  • Related