I'm trying to iterate through each table, and extract the headers of each table separately. This is what I've got so far, but whenever I run this it seems to be extracting the header of all tables per loop (headerCount goes up to 61 on each iteration).
namespace DataCollection
{
internal class Program
{
static void Main(string[] args)
{
int headerCount;
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("https://en.wikipedia.org/wiki/List_of_actors_who_have_played_the_Doctor");
//Extracting the tables from the HTML
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
headerCount = 0;
//Extracting the header cells from each table
foreach (HtmlNode headerCol in table.SelectNodes("//th"))
{
headerCount ;
Console.WriteLine(headerCount);
}
};
Console.ReadLine();
}
}
}
What am I doing wrong? Thanks in advance!
CodePudding user response:
I went through the same problem and its intuitive.
using SelectNodes("//th") will search through the entire web document again instead of searching through the selected htmlnode. and its weird.
Try using ".//th"
Placing a dot at the start will tell it to search trough the active node and not the entire htmldocument again. Hope it helps.