Home > Enterprise >  Scrape data from web page with HtmlAgilityPack c#
Scrape data from web page with HtmlAgilityPack c#

Time:05-27

I had a problem scraping data from a web page which I got a solution Scrape data from web page that using iframe c#

My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.

Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?

I don't know how @coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use

 var reqUrlContent =
         hc.PostAsync(url,
        new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
        "application/x-www-form-urlencoded"))
        .Result;

to pass the variables

EDIT: When I check the webpage there is an input which contains the number

input type="text" id="report_container_containerno" name="report_container[containerno]" required="required" minlength="11" maxlength="11" placeholder="E/K για αναζήτηση" value="ARKU2215462" Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result

Also when I check the DocumentNode it seems to show me the cookies page that I should agree. Can I bypass or auto allow cookies?

CodePudding user response:

Try this:

public static string Download(string search)
{
    var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");

    var postData = string.Format("report_container[containerno]={0}&report_container[search]=", search);
    var data = Encoding.ASCII.GetBytes(postData);

    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = data.Length;

    using (var stream = request.GetRequestStream())
    {
        stream.Write(data, 0, data.Length);
    }

    using (var response = (HttpWebResponse)request.GetResponse())
    using (var stream = new StreamReader(response.GetResponseStream()))
    {
        return stream.ReadToEnd();
    }
}

Usage:

var html = Download("ARKU2215462");
  • Related