Home > front end >  How can I extract the headers from multiple HTML tables using C# HTMLAgilityPack?
How can I extract the headers from multiple HTML tables using C# HTMLAgilityPack?

Time:01-29

I'm trying to iterate through each table, and extract the headers of each table separately. This is what I've got so far, but whenever I run this it seems to be extracting the header of all tables per loop (headerCount goes up to 61 on each iteration).

namespace DataCollection
{
    internal class Program
    {
        static void Main(string[] args)
        {
            int headerCount;
            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc = web.Load("https://en.wikipedia.org/wiki/List_of_actors_who_have_played_the_Doctor");
            //Extracting the tables from the HTML
            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {
                headerCount = 0;
                //Extracting the header cells from each table
                foreach (HtmlNode headerCol in table.SelectNodes("//th"))
                {
                    headerCount  ;
                    Console.WriteLine(headerCount);
                }
             
            };
           
            Console.ReadLine();
        }
    }
}

What am I doing wrong? Thanks in advance!

CodePudding user response:

I went through the same problem and its intuitive.

using SelectNodes("//th") will search through the entire web document again instead of searching through the selected htmlnode. and its weird.

Try using ".//th"

Placing a dot at the start will tell it to search trough the active node and not the entire htmldocument again. Hope it helps.

  • Related