I am building an application when i want to parse some data from youtube playlist using HtmlAbilityPack
:
https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q
And i want to use this data in my WPF application. For examle - name of playlist: "Top Hit 2021 ~ Chill Songs ~ At My Wors x Monter x Beautiful Scar" in this case.
Buy my xpath expressions everytime return me null. But when i try to check it on different xpath-testers - it works. But in my application it doesnt work
For example:
HtmlWeb webDoc = new HtmlWeb();
HtmlDocument docFirst = webDoc.Load("https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q");
// and my title field = null.
var title = docFirst.DocumentNode.SelectSingleNode(".//*[@id='title']/yt-formatted-string/a/text()");
And I dont know how to fix it
CodePudding user response:
You get always null
because the document you loaded in docFirst
variable doesn't have the node you are looking for. That happens because the html page must be rendered first, using a JS engine. A common way to go is to use Selenium (or any other automation framework, e.g Playwright).
This is an example using Selenium Html Agility Pack, but you can ommit HAP:
var driver = new ChromeDriver();
driver.Navigate().GoToUrl("https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q");
// In a real world scenario you will, most likely, use
// the built-in selenium wait methods
Console.ReadLine();
// Also here, you probably need to close any popup might appear.
// This could be done by clicking on a button for example.
// Take a look to the selenium official page.
// In order to make this sample code to work,
// click manually if there is any popup and press enter in console window.
// And finally you will get the page source code that includes
// the title you are looking for.
var src = driver.PageSource;
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(src);
var title = htmlDoc.DocumentNode
.SelectSingleNode("//*[@id='title']/yt-formatted-string/a")
.InnerHtml;
Console.WriteLine(title);