Home > Blockchain >  HtmlAgilityPack issue
HtmlAgilityPack issue

Time:11-16

Suppose I have the following HTML code:

<div class="MyDiv">
<h2>Josh</h2>
</div>


<div class="MyDiv">
<h2>Anna</h2>
</div>


<div class="MyDiv">
<h2>Peter</h2>
</div>

And I want to get the names, so this is what I did (C#):

    string url = "https://...";
    var web = new HtmlWeb();
    HtmlNode[] nodes = null;
    HtmlDocument doc = null;
    doc = web.Load(url);
    nodes = doc.DocumentNode.SelectNodes("//div[@class='MyDiv").ToArray() ?? null;
    foreach (HtmlNode n in nodes){
         var name = n.SelectSingleNode("//h2");
         Console.WriteLine(name.InnerHtml);        
    }

Output:

Josh
Josh
Josh

and it is so strange because n contains only the desired <div>. How can I resolve this issue?

Fixed by writing .//h2 instead of //h2

CodePudding user response:

It's because of your XPath statement "//h2". You should change this simply to "h2". When you start with the two "//" the path starts at the top. And then it selects "Josh" every time, because that is the first h2 node.

You could also do like this:

List<string> names = 
    doc.DocumentNode.SelectNodes("//div[@class='MyDiv']/h2")
        .Select(dn => dn.InnerText)
        .ToList();
        
foreach (string name in names)
{
    Console.WriteLine(name);
}
  • Related