I'm trying to pass the driver.PageSource from Selenium C# to HTML Agility Pack, but this line of code htmlDoc.Load(driver.PageSource);
returns error: '...' is too long, or a component of the specified path is too long.
p.s. Selenium Python and Beautiful Soup doesn't produce this error, when I was trying to do the same thing in Python instead of C#.
How to resolve this problem?
Full Code:
using System;
using System.Threading;
using HtmlAgilityPack;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
namespace SeleniumSharp
{
public static class WebScraping
{
public static void GetPageData()
{
// initial setup
IWebDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl("<url>");
// dropdown
var dropdown1 = driver.FindElement(By.Id("cpMain_ucc1_ctl00_liResidentialFront"));
dropdown1.Click();
// enter search query
var search = driver.FindElement(By.Id("cpMain_ucc1_ctl00_txtResidentialSearchBox"));
search.Click();
search.SendKeys("london");
Thread.Sleep(3000);
// submit search
var submit = driver.FindElement(By.XPath("//div[@id='cpMain_ucc1_ctl00_pnlContentResidential']//a[@class='search-button']"));
submit.Click();
// Html Agility Pack
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(driver.PageSource);
var address = htmlDoc.DocumentNode
.SelectNodes("//div[@class='grid-address']")
.ToList();
foreach(var item in address)
{
Console.WriteLine(item.InnerText);
}
}
}
}
This line of code returns error:
htmlDoc.Load(driver.PageSource);
Error:
'<html source>'is too long, or a component of the specified path is too long.
at System.IO.PathHelper.GetFullPathName(ReadOnlySpan`1 path, ValueStringBuilder& builder)
at System.IO.PathHelper.Normalize(String path)
at System.IO.Path.GetFullPath(String path)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
at System.IO.StreamReader..ctor(String path, Encoding encoding)
at HtmlAgilityPack.HtmlDocument.Load(String path)
CodePudding user response:
It is because you are using the method Load
instead of LoadHtml
. Load method consumes path to file that contains HTML, not HTML source (driver.PageSource).
// From File
var doc = new HtmlDocument();
doc.Load(filePath);
// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);
So try to use
htmlDoc.LoadHtml(driver.PageSource);