I'm trying to scrape the imdb top 250 movies and I want to get all the links of those movies from this page https://www.imdb.com/chart/top/
I tried
html = urlopen('https://www.imdb.com/chart/top/')
bs = BeautifulSoup(html,'html.parser')
links = []
for link in bs.find('td',{'class':'titleColumn'}).find_all('a'):
links.append(link['href'])
print(links)
but I'm only getting the first link only, so my question is how to scale this code to include the whole list of 250 movies?
CodePudding user response:
bs.find('td',{'class':'titleColumn'})
gives you the first entry, and find_all('a')
gives you all the <a>
tags under that entry. To find all the entries you can use
for link in bs.select('td.titleColumn > a'):
links.append(link['href'])
If you still want to iterate over the titles list and extract more information you need to locate all the titles and extract <a>
from each one
for title in bs.find_all('td', {'class': 'titleColumn'}):
links.append(title.find('a')['href'])