Python BeautifulSoup failure to get data from a div with a certain class-CodePudding

I am working on a program that will scrape metacritic for info on the movie from my library and display it but in certain parts like grabbing the rating always returns nothing what am I doing wrong?

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: "   ratingsGet(headers,movie))
        print("Home release year: "   rYearGet(headers,movie))
        break

html snippet:

<table  summary="13 Going on 30 Details and Credits">
<tr >
<td >Runtime:</td>
<td >98 min</td>
</tr>
<tr >
<td >Rating:</td>
<td >
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr >
<td >Production:</td>
<td >Revolution Studios</td>
</tr>

CodePudding user response：

As you said, you need to look for a "tr" (not a "div"). I will also append to the answer this.

Try to use only find (no need of find all)
If the result of find is not None, do another find in it to get only the text, like this:

g_data.find("td", { "class": "data" }).text

The genral code will be something like this:

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"

CodePudding user response：

I was just searching for the wrong element....

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("tr", {"class": "movie_rating"})
    print(g_data[0].text.strip(" "))
    
    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"