I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB
from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests
try:
source =requests.get('https://www.imdb.com/list/ls024372673/')
source.raise_for_status()
soup=BeautifulSoup(source.text,'html.parser')
movies=soup.find('div',class_="lister-list").find_all('div')
for movie in movies :
name= movie.find('h3',class_="lister-item-header").a.text
rank= movie.find('span',class_="lister-item-index unbold text-primary").text
year= movie.find('span',class_="lister-item-year text-muted unbold").text
star= movie.find('span',class_="ipl-rating-star__rating").text
metascore= movie.find('div',class_="inline-block ratings-metascore").span.text
score=movie.find('div',class_="list-description").text
genre=movie.find('span',class_="genre").text
runtime=movie.find('span',class_="runtime").text
about=movie.find('p',class_="").text
elements = movie.findAll('span', attrs = {'name':'nv'})
votes = elements[0]['data-value']
gross = elements[1]['data-value']
print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
print(e)
CodePudding user response:
You better should check what happens in your try / except
blocks and handle exceptions e.g. with if statements
:
'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
Example
You also could use a more structured way to hold your results:
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)
data = []
for movie in soup.select('.lister-item'):
data.append({
'name': movie.find('h3',class_="lister-item-header").a.text,
'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
'star': movie.find('span',class_="ipl-rating-star__rating").text,
'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
'genre': movie.find('span',class_="genre").text.strip(),
'runtime': movie.find('span',class_="runtime").text,
'about': movie.find('p',class_="").text,
'elements': movie.find_all('span', attrs = {'name':'nv'}),
'votes': elements[0]['data-value'],
'gross': elements[1]['data-value']
})
data
CodePudding user response:
movies
it's not a list. You are using .find()
that return the first found element. You have to use instead .find_all()
which return a list.
Also you are looking for all the items inside the element with , but in this way you will get only one element, not a list of movies. You should search for all the elements with
.
source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()
soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")
for movie in movies:
name = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
rank = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
year = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
stars = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
# score = movie.find("div", class_="list-description").text // There isn't this class inside movie
genre = (movie.find("span", class_="genre").text).strip()
runtime = (movie.find("span", class_="runtime").text).strip()
about = (movie.find("p", class_="").text).strip()
elements = movie.findAll("span", attrs = {"name":"nv"})
votes = elements[0]['data-value']
gross = elements[1]['data-value']
An other problem is the score
variable. There is no div
with inside your movie element. You will get an error because it will return a
NoneType object
that have no attribute text
. I have also added a .strip()
to remove the spaces.
Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the .strip()
.