I am trying to get the author of every video on the YouTube homepage by web-scraping with BeautifulSoup4.
This is the chunk of HTML I am trying to navigate to.
<a class="yt-simple-endpoint style-scope yt-formatted-string" spellcheck="false" href="/c/ApertureScience" dir="auto">Aperture</a>
With the link: https://www.youtube.com/
And I am trying to get the item "Aperture".
The problem is that I can't seem to navigate correctly to the data, I have been trying this:
source = urllib.request.urlopen('https://www.youtube.com/').read()
soup = bs.BeautifulSoup(source,'lxml')
for i in soup.find_all('a', class_='yt-simple-endpoint style-scope yt-formatted-string'):
print(i)
And nothing prints, I think it is because of the weird spaces in the class name but I don't know how to get around that.
If any ideas help, thank you!
CodePudding user response:
try the syntax:
find_all('a',{'class' : 'yt-simple-endpoint style-scope yt-formatted-string'})
and for the 'Aperture' use string or content or text.
And if the content is Dynamic, you could use Selenium.