Home > Software engineering >  Selecting an anchor node by matching innertext with HTML Agility Pack
Selecting an anchor node by matching innertext with HTML Agility Pack

Time:10-21

Using HTML Agility Pack to grab this document and the section that I'm looking at is below:

<TR>
 <TH SCOPE="ROW"  align=right><A  
 href="saferhelp.aspx#">MCS-150 Form Date:</A>
 </TH>
   <TD  valign=top>06/17/2022&nbsp;</TD>
 <TH SCOPE="ROW"  align=right><A  
 href="saferhelp.aspx#">MCS-150 Mileage (Year):</A>
 </TH>
  <TD valign=top><FONT style=font-size:80% face=arial color=#0000C0><B>92,087 (2020)&nbsp; 
  </TD>
</TR>

I'm trying to scrape the value for MCS-150 Form Date from the government FMCSA database, but it doesn't have any unique identifier to it other than the sibling header having an innertext of "MCS-150 Form Date:"

Is it possible to use SelectSingleNode (or other method) to find the anchor tag associated to that string match so that I can just move to next TH sibling and get the value?

I've been trying to format a proper xpath, but to no avail. Code line:

HtmlNode nodesInDiv = htmlDoc.DocumentNode.SelectSingleNode(mcsFormDatePath);

CodePudding user response:

One of these selectors should work:

//A[normalize-space()='MCS-150 Form Date:']

or

///A[normalize-space()='MCS-150 Form Date:']
  • Related