Home > Blockchain >  Python Web-scraping youtube.com BeautifulSoup4 problem
Python Web-scraping youtube.com BeautifulSoup4 problem

Time:11-13

I am trying to get the author of every video on the YouTube homepage by web-scraping with BeautifulSoup4.

This is the chunk of HTML I am trying to navigate to.

<a class="yt-simple-endpoint style-scope yt-formatted-string" spellcheck="false" href="/c/ApertureScience" dir="auto">Aperture</a>

With the link: https://www.youtube.com/

And I am trying to get the item "Aperture".

The problem is that I can't seem to navigate correctly to the data, I have been trying this:

source = urllib.request.urlopen('https://www.youtube.com/').read()
soup = bs.BeautifulSoup(source,'lxml')
for i in soup.find_all('a', class_='yt-simple-endpoint style-scope yt-formatted-string'):
    print(i)

And nothing prints, I think it is because of the weird spaces in the class name but I don't know how to get around that.

If any ideas help, thank you!

CodePudding user response:

try the syntax:

find_all('a',{'class' : 'yt-simple-endpoint style-scope yt-formatted-string'})

and for the 'Aperture' use string or content or text.

And if the content is Dynamic, you could use Selenium.

  • Related