Home > Mobile >  I need to get specific values from a node with HtmlAgilePack
I need to get specific values from a node with HtmlAgilePack

Time:11-15

I need to extract some data from a page, of which the HTML is poorly named. The html looks something like the following:

<div >
    <h1 >Aarakocra</h1>
    <div >
        <h2 >Armor Class: </h2>
        <h2 >12</h2>
    </div><div >
        <h2 >hit Points: </h2>
        <h2 >13 (3d8)</h2></div>

In this example, I am trying to get the values "12" and "13 (3d8)"

So far I've tried this:

HtmlAgilityPack.HtmlWeb website = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument pageMonsterStats = website.Load(websiteUrl   "/"   monsterName);
var monsterNode = pageMonsterStats.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
Console.WriteLine(monster.Descendants("div").Where(node => node.Equals("Armor Class: ")).ToString());

I expected to get the index of the element which contains "Armor Class: ", which I would then use to get the value ("12") from the same element, but this returns "System.Linq.Enumerable WhereEnumerableIterator`1[HtmlAgilityPack.HtmlNode]"

CodePudding user response:

That is because Where does return an IEnumerable. Try First, Last or concat your output into a string.

Console.WriteLine(monster.Descendants("div").First(node => node.Equals("Armor Class: ")).ToString());

In your case you may want to do something like this:

using System;
using System.Linq;

public class Program
{
    public static void Main()
    {
        const string html = @"
<div container-entry"">
  <h1 entry-heading"">Aarakocra</h1>
  <div entry-metadata"">
    <h2 entry-metadata-label"">Armor Class: </h2>
    <h2 entry-metadata-label"">12</h2>
  </div>
  <div entry-metadata"">
    <h2 entry-metadata-label"">hit Points: </h2>
    <h2 entry-metalabel-content"">13 (3d8)</h2>
  </div>
</div>
";
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    var monsterNode = doc.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
    var data = monsterNode.Descendants("div").Select(x => x.Descendants("h2")).SelectMany(x => x).Select(x => x.InnerText).ToArray();
    var armorClass = data[1];
    var hitPoints = data[3]; // if you want

    Console.WriteLine(armorClass); // output 12
    }
}

Demo

  • Related