I need to extract some data from a page, of which the HTML is poorly named. The html looks something like the following:
<div >
<h1 >Aarakocra</h1>
<div >
<h2 >Armor Class: </h2>
<h2 >12</h2>
</div><div >
<h2 >hit Points: </h2>
<h2 >13 (3d8)</h2></div>
In this example, I am trying to get the values "12" and "13 (3d8)"
So far I've tried this:
HtmlAgilityPack.HtmlWeb website = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument pageMonsterStats = website.Load(websiteUrl "/" monsterName);
var monsterNode = pageMonsterStats.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
Console.WriteLine(monster.Descendants("div").Where(node => node.Equals("Armor Class: ")).ToString());
I expected to get the index of the element which contains "Armor Class: ", which I would then use to get the value ("12") from the same element, but this returns "System.Linq.Enumerable WhereEnumerableIterator`1[HtmlAgilityPack.HtmlNode]"
CodePudding user response:
That is because Where does return an IEnumerable. Try First, Last or concat your output into a string.
Console.WriteLine(monster.Descendants("div").First(node => node.Equals("Armor Class: ")).ToString());
In your case you may want to do something like this:
using System;
using System.Linq;
public class Program
{
public static void Main()
{
const string html = @"
<div container-entry"">
<h1 entry-heading"">Aarakocra</h1>
<div entry-metadata"">
<h2 entry-metadata-label"">Armor Class: </h2>
<h2 entry-metadata-label"">12</h2>
</div>
<div entry-metadata"">
<h2 entry-metadata-label"">hit Points: </h2>
<h2 entry-metalabel-content"">13 (3d8)</h2>
</div>
</div>
";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var monsterNode = doc.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
var data = monsterNode.Descendants("div").Select(x => x.Descendants("h2")).SelectMany(x => x).Select(x => x.InnerText).ToArray();
var armorClass = data[1];
var hitPoints = data[3]; // if you want
Console.WriteLine(armorClass); // output 12
}
}