Home > Software engineering >  Error in trying to parse youtube with htmlagility pack to c#
Error in trying to parse youtube with htmlagility pack to c#

Time:07-17

I am building an application when i want to parse some data from youtube playlist using HtmlAbilityPack: https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q

And i want to use this data in my WPF application. For examle - name of playlist: "Top Hit 2021 ~ Chill Songs ~ At My Wors x Monter x Beautiful Scar" in this case. Buy my xpath expressions everytime return me null. But when i try to check it on different xpath-testers - it works. But in my application it doesnt work

For example:

    HtmlWeb webDoc = new HtmlWeb();
            HtmlDocument docFirst = webDoc.Load("https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q");

// and my title field = null.
            var title = docFirst.DocumentNode.SelectSingleNode(".//*[@id='title']/yt-formatted-string/a/text()");

And I dont know how to fix it

CodePudding user response:

You get always null because the document you loaded in docFirst variable doesn't have the node you are looking for. That happens because the html page must be rendered first, using a JS engine. A common way to go is to use Selenium (or any other automation framework, e.g Playwright).

This is an example using Selenium Html Agility Pack, but you can ommit HAP:

var driver = new ChromeDriver();
driver.Navigate().GoToUrl("https://www.youtube.com/playlist?list=PLDx6vxaCLeUTw0NRQhgYwCaWVf9j-N02Q");

// In a real world scenario you will, most likely, use
// the built-in selenium wait methods
Console.ReadLine();

// Also here, you probably need to close any popup might appear.
// This could be done by clicking on a button for example.
// Take a look to the selenium official page.

// In order to make this sample code to work, 
// click manually if there is any popup and press enter in console window.

// And finally you will get the page source code that includes
// the title you are looking for.
var src = driver.PageSource;
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(src);

var title = htmlDoc.DocumentNode
                   .SelectSingleNode("//*[@id='title']/yt-formatted-string/a")
                   .InnerHtml;

Console.WriteLine(title);

Selenium official page

  • Related