Home > Net >  HTML Agility Pack foreach loop not iterating for data grid (C#)
HTML Agility Pack foreach loop not iterating for data grid (C#)

Time:03-14

I'm a beginner programmer working on a small webscraper in C#. The purpose is to take a hospital's public website, grab the data for each doctor, their department, phone and diploma info, and display it in a Data Grid View. It's a public website, and as far as I'm concerned, the website's robots.txt allows this, so I left everything in the code as it is.

I am able to grab each data (name, department, phone, diploma) separately, and can successfully display them in a text box.

// THIS WORKS:
            string text = "";
            foreach (var nodes in full)
            {
                text  = nodes.InnerText   "\r\n";
            }
            textBox1.Text = text;

However, when I try to pass the data on to the data grid view using a class, the foreach loop only goes through the first name and fills the data grid with that.

foreach (var nodes in full)
            {
                var Doctor = new Doctor
                {
                    Col1 = full[0].InnerText,
                    Col2 = full[1].InnerText,
                    Col3 = full[2].InnerText,
                    Col4 = full[3].InnerText,
                };
                Doctors.Add(Doctor);
            }

I spent a good few hours looking for solutions but none of what I've found have been working, and I'm at the point where I can't decide if I messed up the foreach loop somehow, or if I'm not doing something according to HTML Agility Pack's rules. It lets me iterate through for the textbox, but not the foreach. Changing full[0] to nodes[0] or nodes.InnerText doesn't seem to solve it either.

link to public gist file (where you can see my whole code)

screenshot

Thank you for the help in advance!

CodePudding user response:

The problem is how you're selecting the nodes from the page. full contains all individual names, departments etc. in a flat list, which means full[0] is the name of the first doctor while full[4] is the name of the next. Your for-loop doesn't take that into account, as you (for every node) always access full[0] to full[3] - so, only the properties of the first doctor.

To make your code more readable I'd split it up a bit to first make a list of all the card-elements for each doctor and then select the individual parts within the loop:

HtmlWeb web = new HtmlWeb();

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("https://klinikaikozpont.unideb.hu/doctor_finder");

const string doctorListItem = "div[contains(@class, 'doctor-list-item-model')]";
const string cardContent = "div[contains(@class, 'card-content')]";
var doctorCards = doc.DocumentNode.SelectNodes($"//{doctorListItem}/{cardContent}");

var doctors = new List<Doctor>();
foreach (var card in doctorCards)
{
    var name = card.SelectSingleNode("./h3")?.InnerText;
    const string departmentNode = "div[contains(@class, 'department-name')]";
    var department = card.SelectSingleNode($"./{departmentNode}/p")?.InnerText;
    // other proprties...

    doctors.Add(new Doctor{NameAndTitle = name, Department = department});
}

// I took the liberty to make this class easier to understand
public class Doctor
{
    public string NameAndTitle { get; set; }
    public string Department { get; set; }
    // Add other properties
}

Check out the code in action.

  • Related