Home > Net >  web scraping in python it won't show the whole list
web scraping in python it won't show the whole list

Time:06-26

I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e) 

CodePudding user response:

You better should check what happens in your try / except blocks and handle exceptions e.g. with if statements:

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
Example

You also could use a more structured way to hold your results:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data

CodePudding user response:

movies it's not a list. You are using .find() that return the first found element. You have to use instead .find_all() which return a list.

Also you are looking for all the items inside the element with , but in this way you will get only one element, not a list of movies. You should search for all the elements with .

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

An other problem is the score variable. There is no div with inside your movie element. You will get an error because it will return a NoneType object that have no attribute text. I have also added a .strip() to remove the spaces.

Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the .strip().

  • Related