I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application). This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException: temperatura was null.
CodePudding user response:
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(@class, 'temperature')]/div[contains(@class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
- // is to search in any place of document
- You search any div that contains a class "temperature" and, inside that node:
- you search a div child with "temp" class
- If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
- You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt Tab). Things become harder in this way but sometimes is the only way.
- The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.