Home > Net >  python - selenium scraping rotten tomatoes for audience score
python - selenium scraping rotten tomatoes for audience score

Time:06-10

I'm trying to scrape the audience score from rotten tomatoes. I was able to get reviews but not sure how use selenium to get the "audiencescore"

Source:

<score-board
audiencestate="upright"
audiencescore="96"

rating="R"
skeleton="panel"
tomatometerstate="certified-fresh"
tomatometerscore="92"
data-qa="score-panel"
                >
<h1 slot="title"  data-qa="score-panel-movie-title">Pulp Fiction</h1>
<p slot="info" >1994, Crime/Drama, 2h 33m</p>
<a slot="critics-count" href="/m/pulp_fiction/reviews?intcmp=rt-scorecard_tomatometer-reviews"  data-qa="tomatometer-review-count">110 Reviews</a>
<a slot="audience-count" href="/m/pulp_fiction/reviews?type=user&amp;intcmp=rt-scorecard_audience-score-reviews"  data-qa="audience-rating-count">250,000  Ratings</a>
<div slot="sponsorship" id="tomatometer_sponsorship_ad"></div>
                </score-board>

Code:

from selenium import webdriver

driver = webdriver.Firefox()
url = 'https://www.rottentomatoes.com/m/pulp_fiction'
driver.get(url)

print(driver.find_element_by_css_selector('a[slot=audience-count]').text)

CodePudding user response:

The attribute value of audiencescore which is not any text nodes value that's why we can't invoke .text method to grab that value. So you have to call get_attribute() after selecting the right locator. The following expression is working.

print(driver.find_element(By.CSS_SELECTOR,'#topSection score-board').get_attribute('audiencescore'))

#import

from selenium.webdriver.common.by import By

CodePudding user response:

Try this:

1- Get element score-board

2- Get audiencescore attribute from element

audiencescore = driver.find_element_by_css_selector('score-board').get_attribute('audiencescore')

CodePudding user response:

You were close enough. To extract the value of the audiencescore attribute i.e. the text 96 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "score-board.scoreboard"))).get_attribute("audiencescore"))
    
  • Using XPATH:

    driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//score-board[@class='scoreboard']"))).get_attribute("audiencescore"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    96
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

  • Related