Home > database >  Get specific table from html document with HtmlAgilityPack C#
Get specific table from html document with HtmlAgilityPack C#

Time:08-08

I have html document with two tables. For example:

<html>
    <body>
        <p>This is where first table starts</p>
        <table>
            <tr>
                <th>head</th>
                <th>head1</th>
            </tr>
            <tr>
                <td>data</td>
                <td>data1</td>
            </tr>
        </table>
        <p>This is where second table starts</p>
        <table>
            <tr>
                <th>head</th>
                <th>head1</th>
            </tr>
            <tr>
                <td>data</td>
                <td>data1</td>
            </tr>
        </table>
    </body>
</html>

And i want to parse first and second but separatly I will explain:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(@richTextBox1.Text);
if(comboBox_tables.Text.Equals("Table1"))
{
   DataTable dt = new DataTable();
   dt.Columns.Add("id", typeof(string));
   dt.Columns.Add("inserted_at", typeof(string));
   dt.Columns.Add("DisplayName", typeof(string));
   HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
            foreach (var row in doc.DocumentNode.SelectNodes("//tr"))
                {
                    var nodes = row.SelectNodes("td");
                    if (nodes != null)
                    {

                        var id = nodes[0].InnerText;
                        var inserted_at = nodes[1].InnerText;
                        var DisplayName = nodes[2].InnerText;

                        dt.Rows.Add(id, inserted_at, DisplayName);
                    }
    dataGridView1.DataSource = dt;

I'm trying to select first table with //table[1]. But it's always takes both tables. How can i select the first table for if(table1) and the second for else if(table2)?

CodePudding user response:

You are selecting the table[1], but not doing anything with the return value. Use the table variable to select all tr nodes.

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in table.SelectNodes("//tr"))

.. rest of the code

  • Related