Home > other >  Scraping the rating of some reviews as pictures
Scraping the rating of some reviews as pictures

Time:04-29

I am trying to scrape the rating of some movie reviews but the thing is that the rating is not a number, but it is composed from 10 images that can be full stars or empty stars.

This is the website from where I want to scrape the data: https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC

This is my code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC'
page = requests.get(url)

soup = BeautifulSoup(page.content, "html.parser")

rating=0
scraped_ratings = soup.find_all('span', class_='stelutze').find=("img")
for i in scraped_ratings:
    if "star_full.gif" in i.get("src"):
        rating  = 1
print(rating)

Somebody helped me with this code but it only gives the rating of the first review.

rating=0
rawRating = soup.find("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
    if "star_full.gif" in i.get("src"):
        rating  = 1
print(rating)

I tried to change the code to this:

rating=0
count=0
rawRating = soup.find_all("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
    if "star_full.gif" in i.get("src"):
        rating  = 1
    count = 1
    if count == 10:
        print(rating)
        rating=0
        count=0

But I get this error: AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

I think this is because I can't use two find_all in the same statement.

Any help?

CodePudding user response:

I believe that this should solve your issue, I have not tested this but I don't see why it shouldn't work.

Basically, when you do find_all you get a list back of all the elements it finds. So what it is doing is it first gets every review on the page and then you iterate over each review and get all the images for each review like you did before.

rating=0
count=0
rawRatings = soup.find_all("span", {"class": "stelutze"})
for i in rawRatings:
    rawRating = i.find_all("img")
    for j in rawRating:
        if "star_full.gif" in j.get("src"):
            rating  = 1
        count  = 1
        if count == 10:
            print(rating)
            rating = 0
            count = 0

If you have any questions let me know

  • Related