Home > OS >  How to parse html table (from file) by specific ID
How to parse html table (from file) by specific ID

Time:10-22

I am trying to get specific table (by id) from downloaded html and parse it I´ve tried few ways and my last code is

            var url = @"C:\Users\name\Plocha\web.html";

        var doc = new HtmlDocument();

        doc.Load(url);

        string data = "";
        int i = 2;
        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine($"Found: {table.Id}");
            if (table.Id == "formTbl")
            {
                foreach (HtmlNode row in table.SelectNodes("//tr"))
                {
                    foreach (HtmlNode cell in row.SelectNodes("td"))
                    {
                        if (i == 1)
                        {
                            data  = $"Column:{cell.InnerText}";
                            i = 2;
                        }
                        else if (i == 2)
                        {
                            data  = $"Row: {cell.InnerText}";
                            Console.WriteLine(data);
                            data = "";
                            i = 1;
                        }
                        
                        
                            
                    }
                    
                    
                }
            }
            else
            {
                Console.WriteLine("Not what we want");
            }



        }

The problem is that it print all tables from webpage even tho I have specified to continue only if id = formTbl.

How data looks on table (theres no Name of columns its just two rows, in first row is name of column and in second row is value) Table

CodePudding user response:

SelectNodes() takes an XPath query. Some useful examples here. A particular one that is relevant to your case is this: //book - Selects all book elements no matter where they are in the document.

This means that instead of using "//tr" (searches the whole doc), you should look for "tr" if you want to respect the scope.

You could even use xpath to do the id searching AND selecting the <tr> underneath, using a single query:

foreach (var row in doc.DocumentNode.SelectNodes("//table[@id='formTbl']/tr"))
{
    // ...do <tr> stuff
    foreach (var cell in row.SelectNodes("td"))
    {
        // ... do <td> stuff
    }
}

CodePudding user response:

foreach (var table in doc.DocumentNode.SelectNodes("//table[@id='formTbl']"))
{
    foreach (var row in table.SelectNodes("tbody/tr"))
    {
        Console.WriteLine(row.Id);
        foreach (var cell in row.SelectNodes("td"))
        {
            Console.WriteLine(cell.InnerText);
        }
    }
}

Problem was that I hasn't used tbody/tr

Thanks to @NPras

  • Related