Using HTML Agility Pack to grab this document and the section that I'm looking at is below:
<TR>
<TH SCOPE="ROW" align=right><A
href="saferhelp.aspx#">MCS-150 Form Date:</A>
</TH>
<TD valign=top>06/17/2022 </TD>
<TH SCOPE="ROW" align=right><A
href="saferhelp.aspx#">MCS-150 Mileage (Year):</A>
</TH>
<TD valign=top><FONT style=font-size:80% face=arial color=#0000C0><B>92,087 (2020)
</TD>
</TR>
I'm trying to scrape the value for MCS-150 Form Date from the government FMCSA database, but it doesn't have any unique identifier to it other than the sibling header having an innertext of "MCS-150 Form Date:"
Is it possible to use SelectSingleNode (or other method) to find the anchor tag associated to that string match so that I can just move to next TH sibling and get the value?
I've been trying to format a proper xpath, but to no avail. Code line:
HtmlNode nodesInDiv = htmlDoc.DocumentNode.SelectSingleNode(mcsFormDatePath);
CodePudding user response:
One of these selectors should work:
//A[normalize-space()='MCS-150 Form Date:']
or
///A[normalize-space()='MCS-150 Form Date:']