Home > Software design >  Python, selenium webdriver chrome, obtain page source from inside of many web elements
Python, selenium webdriver chrome, obtain page source from inside of many web elements

Time:12-27

Hello i am using selenium webdriver for chrome in python and trying to scrape some data from https://www.google.com/travel/things-to-do

I am here focused on the place decscription which can be seen here:

So in order to get to each individual description I have to press on the attraction and save the html to list for future parsing with BeautifulSoup.

Every click refresh the page so i was thinking about couting somehow all the attractions that got displayed and then in loop click every attraction with saving the description.

Anybody has any idea how to approach it? Heres simple code that gets you to the place where i am stuck

chrome_options = webdriver.ChromeOptions()
#chrome_options.headless = True
chrome_options.add_argument('--incognito')
#chrome_options.add_argument('--headless')
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(executable_path=r"\\chromedriver.exe", options=chrome_options, service=s)
driver.get("https://www.google.com/travel/things-to-do/see-all?dest_mid=/m/081m_&dest_state_type=sattd&dest_src=yts&q=Warszawa#ttdm=52.227486_21.004941_13&ttdmf=%2Fm%2F0862m")

# If you are not running webdriver in incognito mode you might skip the below button since it goes through accepting cookies
button = driver.find_element_by_xpath("/html/body/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button")

button.click()
time.sleep(1)

objects = driver.find_elements_by_class_name('f4hh3d')

for k in objects:
    k.click()
    time.sleep(5)

CodePudding user response:

For each attraction index you can click it to open the details, get the details, close the details, get the list of attractions again and go for the next attraction.
Something like this should work:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://www.google.com/travel/things-to-do/see-all?dest_mid=/m/081m_&dest_state_type=sattd&dest_src=yts&q=Warszawa#ttdm=52.227486_21.004941_13&ttdmf=%2Fm%2F0862m")
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.f4hh3d")))
time.sleep(1)
attractions = driver.find_elements_by_css_selector('div.f4hh3d')
for i in range(len(attractions)):
    attractions = driver.find_elements_by_css_selector('div.f4hh3d')
    attractions[i].click()
    description = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div[jsname="BEZjkb"] div[jsname="bN97Pc"]'))).text
    #do with the description what you want
    #close the attraction by clicking the button
    driver.find_element_by_css_selector('div.reh1ld button')
  • Related