python - selenium scraping rotten tomatoes for audience score-CodePudding

I'm trying to scrape the audience score from rotten tomatoes. I was able to get reviews but not sure how use selenium to get the "audiencescore"

Source:

<score-board
audiencestate="upright"
audiencescore="96"

rating="R"
skeleton="panel"
tomatometerstate="certified-fresh"
tomatometerscore="92"
data-qa="score-panel"
                >
<h1 slot="title"  data-qa="score-panel-movie-title">Pulp Fiction</h1>
<p slot="info" >1994, Crime/Drama, 2h 33m</p>
<a slot="critics-count" href="/m/pulp_fiction/reviews?intcmp=rt-scorecard_tomatometer-reviews"  data-qa="tomatometer-review-count">110 Reviews</a>
<a slot="audience-count" href="/m/pulp_fiction/reviews?type=user&amp;intcmp=rt-scorecard_audience-score-reviews"  data-qa="audience-rating-count">250,000  Ratings</a>
<div slot="sponsorship" id="tomatometer_sponsorship_ad"></div>
                </score-board>

Code:

from selenium import webdriver

driver = webdriver.Firefox()
url = 'https://www.rottentomatoes.com/m/pulp_fiction'
driver.get(url)

print(driver.find_element_by_css_selector('a[slot=audience-count]').text)

CodePudding user response：

The attribute value of audiencescore which is not any text nodes value that's why we can't invoke .text method to grab that value. So you have to call get_attribute() after selecting the right locator. The following expression is working.

print(driver.find_element(By.CSS_SELECTOR,'#topSection score-board').get_attribute('audiencescore'))

#import

from selenium.webdriver.common.by import By

CodePudding user response：

Try this:

1- Get element score-board

2- Get audiencescore attribute from element

audiencescore = driver.find_element_by_css_selector('score-board').get_attribute('audiencescore')

CodePudding user response：

You were close enough. To extract the value of the audiencescore attribute i.e. the text 96 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "score-board.scoreboard"))).get_attribute("audiencescore"))

Using XPATH:

driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//score-board[@class='scoreboard']"))).get_attribute("audiencescore"))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:
```
96
```

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python