How to extract all the google reviews from google map-CodePudding

I need to scrap all the google reviews. There are 90,564 reviews in my page. However the code i wrote can scrap only top 9 reviews. The other reviews are not scraped.

The code is given below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# specify the url of the business page on Google
url = 'https://www.google.com/maps/place/ISKCON temple Bangalore/@13.0098328,77.5510964,15z/data=!4m7!3m6!1s0x0:0x7a7fb24a41a6b2b3!8m2!3d13.0098328!4d77.5510964!9m1!1b1'

# create an instance of the Chrome driver
driver = webdriver.Chrome()

# navigate to the specified url
driver.get(url)

# Wait for the reviews to load
wait = WebDriverWait(driver, 20) # increased the waiting time
review_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'wiI7pd')))

        
# extract the text of each review
reviews = [element.text for element in review_elements]

# print the reviews
print(reviews)

# close the browser
driver.quit()

what should i edit/modify the code to extract all the reviews?

CodePudding user response：

I think you'll need to scoll down at first, and the get all the reviews.

scroll_value = 230
driver.execute_script( 'window.scrollBy( 0, ' str(scroll_value)  ' )' ) # to scroll by value

# to get the current scroll value on the y axis
scroll_Y = driver.execute_script( 'return window.scrollY' )

That might be because the elements don't get loaded elsewise.

Since they are over 90'000, you might consider scolling down a little, then getting the reviews, repeat.

Resource: https://stackoverflow.com/a/74508235/20443541

CodePudding user response：

Here is the working code for you after launching the url

    totalRev = "div div.fontBodySmall"
    username = ".d4r55"
    reviews = "wiI7pd"

    wait = WebDriverWait(driver, 20)

    totalRevCount = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, totalRev))).get_attribute("textContent").split(' ')[0].replace(',','').replace('.','')
    print("totalRevCount - ", totalRevCount)

    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, totalRev))).click()

    mydict = {}
    found = 0

    while found < int(totalRevCount):

        review_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, reviews)))
        reviewer_names = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, username)))

        found = len(mydict)
        for rev, name in zip(review_elements, reviewer_names):
            mydict[name.text] = rev.text
            if len(rev.text) == 0:
                found = int(totalRevCount)   1
                break

        for i in range(8):
            ActionChains(driver).key_down(Keys.ARROW_DOWN).perform()

        print("found - ", found)

        print(mydict)

        time.sleep(2)

Explanation -

Get the locators for user name and review since we are going to create a key-value pair which will be useful in creating a non-duplicate result
You need to first get the total number of reviews/ratings that are present for that given location.
Get the username and review for the "visible" part of the webpage and store it in the dictionary
Scroll down the page and wait a few seconds
Get the username and review again and add them to dictionary. Only new ones will be added
As soon as a review that has no text (only rating), the loop will close and you have your results.

NOTE - If you want all reviews irrespective of the review text present or not, you can remove the "if" loop

Leava a comment if there are any questions.

Please consider upvoting and accepting if this answer helps you. Thank you