I am webscraping Glassdoor.com for company reviews using Python.
Currently, I am using Beautiful Soup and grequests. This is working fine for all the fields I need, except for the "Advice to Management" section which only loads in once the Continue Reading
button is pressed. See below an example below for this page of reviews:
continue reading button expanded review
There are no changes to the URL as far as I can tell, but there is a JS click-event being fired in the console:
Event: EiReviews: Click [continueReading-71858088]
I found a tutorial online for selenium webdriver
such as this one, and I wrote this code:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome (executable_path="C:\\chromedriver.exe")
driver.get("https://www.glassdoor.com/Reviews/Alteryx-Reviews-E351220.htm")
btn = driver.find_element(By.CLASS_NAME, "v2__EIReviewDetailsV2__continueReading").click()
driver.execute_script ("arguments[0].click();",btn)
I need something that scales better, as this takes ~20sec to open chrome and click on a singular button. I need to be able to click on every "Continue Reading" button on the page as my end goal is to scrape every review for ~1,000 companies.
CodePudding user response:
By looking at the HTML of the page, you can notice that right before the <div id="Container">
object, there is a script object starting with window.appCache={....
which contains the complete reviews in a dictionary format, for example it contains the text which appears when you click on Continue Reading
"summary":"Great place to work, been here 4 years",
"summaryOriginal":null,"advice":"Don't rush too finish a project"