Hello i am using selenium webdriver for chrome in python and trying to scrape some data from https://www.google.com/travel/things-to-do
I am here focused on the place decscription which can be seen here:
So in order to get to each individual description I have to press on the attraction and save the html to list for future parsing with BeautifulSoup.
Every click refresh the page so i was thinking about couting somehow all the attractions that got displayed and then in loop click every attraction with saving the description.
Anybody has any idea how to approach it? Heres simple code that gets you to the place where i am stuck
chrome_options = webdriver.ChromeOptions()
#chrome_options.headless = True
chrome_options.add_argument('--incognito')
#chrome_options.add_argument('--headless')
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(executable_path=r"\\chromedriver.exe", options=chrome_options, service=s)
driver.get("https://www.google.com/travel/things-to-do/see-all?dest_mid=/m/081m_&dest_state_type=sattd&dest_src=yts&q=Warszawa#ttdm=52.227486_21.004941_13&ttdmf=%2Fm%2F0862m")
# If you are not running webdriver in incognito mode you might skip the below button since it goes through accepting cookies
button = driver.find_element_by_xpath("/html/body/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button")
button.click()
time.sleep(1)
objects = driver.find_elements_by_class_name('f4hh3d')
for k in objects:
k.click()
time.sleep(5)
CodePudding user response:
For each attraction index you can click it to open the details, get the details, close the details, get the list of attractions again and go for the next attraction.
Something like this should work:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://www.google.com/travel/things-to-do/see-all?dest_mid=/m/081m_&dest_state_type=sattd&dest_src=yts&q=Warszawa#ttdm=52.227486_21.004941_13&ttdmf=%2Fm%2F0862m")
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.f4hh3d")))
time.sleep(1)
attractions = driver.find_elements_by_css_selector('div.f4hh3d')
for i in range(len(attractions)):
attractions = driver.find_elements_by_css_selector('div.f4hh3d')
attractions[i].click()
description = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div[jsname="BEZjkb"] div[jsname="bN97Pc"]'))).text
#do with the description what you want
#close the attraction by clicking the button
driver.find_element_by_css_selector('div.reh1ld button')