Home > Net >  Python Selenium Crawler go into element and get details
Python Selenium Crawler go into element and get details

Time:10-13

I'm trying to get details of all properties from the following website which has properties listed as elements:

https://www.altamirarealestate.com.cy/results/for-sale/flats/cyprus/35p009679979327046l33p17435142059772z9

I'm using Selenium in Python to scrape the elements' details but as soon as I go to the element I cannot click on its link to open it to a new page and get the necessary information. Code below:

from selenium.webdriver.common.keys import Keys
import webbrowser
import random
import time
import selenium.webdriver.support.ui as ui
from selenium.webdriver.support.wait import WebDriverWait 
from selenium.webdriver.support.select import Select
import csv
from csv import writer
from selenium.common.exceptions import ElementNotVisibleException, WebDriverException, NoSuchElementException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

Link = 'https://www.altamirarealestate.com.cy/results/for-sale/flats/cyprus/35p009679979327046l33p17435142059772z9'

# MAIN
driver = webdriver.Chrome()
driver.maximize_window()


#Go to link
driver.get(Link)

#Accept cookies
time.sleep(2)
driver.find_element_by_xpath('//*[@id="onetrust-accept-btn-handler"]').click()
time.sleep(2)


#Load everything
while True:
    try:
        driver.find_element_by_xpath("//*[contains(@value,'View more')]").click()
        time.sleep(3)
    except Exception as no_more_properties:
            print('all properties expanded: ', no_more_properties)
            break

#Get properties
properties_list=driver.find_elements_by_xpath('//*[@class="minificha   "]')
print (len(properties_list))#25
time.sleep(2)

#Get each property link
property_url=set()
properties_details=[]

main_window_handle = driver.current_window_handle
for i in range(0,len(properties_list)):
    driver.switch_to_window(main_window_handle)
    property = properties_list[i]
    property_link = property.find_element_by_xpath('//a[@href="' url '"]')
    property_link.click()
    time.sleep(2)

    #Switch to property window
    window_after = driver.window_handles[1]
    driver.switch_to.window(window_after)

    #Get number of properties
    number_of_flats=driver.find_elements_by_xpath('//[@class="lineainmu "]')
    print(len(number_of_flats))
    time.sleep(2)

    currentWindow = driver.current_window_handle
    for j in range(0,len(number_of_flats)):
        driver.switch_to_window(currentWindow)
        flat= number_of_flats[j]
        flat.click()
        time.sleep(2)
        
        #Switch to flat window
        window_after = driver.window_handles[1]
        driver.switch_to.window(window_after)

CodePudding user response:

When we click on a link on first page, it will open a new tab. In selenium in these type of cases we should switch the focus to new windows and then we can interact with web elements on the newly open page.

Once the task is done, it's important to close the tab and then switch back to original content.

This may lead to stale element reference, if we do not defined the web elements in loop again.

Code :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)

driver.get("https://www.altamirarealestate.com.cy/results/for-sale/flats/cyprus/35p009679979327046l33p17435142059772z9")

try:
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
except:
    pass

size = driver.find_elements(By.XPATH, "//div[@class='slick-list draggable']")
j = 1
org_windows_handle = driver.current_window_handle
for i in range(len(size)):
    ele = driver.find_element(By.XPATH, f"(//div[@class='slick-list draggable'])[{j}]")
    driver.execute_script("arguments[0].scrollIntoView(true);", ele)
    ele.click()
    all_handles = driver.window_handles
    driver.switch_to.window(all_handles[1])
    try:
        name = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#tituloFiltroTipo"))).text
        print(name)
    except:
        pass
    try:
        price = wait.until(EC.visibility_of_element_located((By.ID, "soloPrecio"))).text
        print(price)
    except:
        pass
    driver.close()
    driver.switch_to.window(org_windows_handle)
    j = j   1

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Output :

Flats - Egkomi, Nicosia
310,000
Flat - Strovolos, Nicosia
115,000
Flat - Agios Dometios, Nicosia
185,000
Flats - Aglantzia, Nicosia
765,000
Flat - Kaimakli, Nicosia
170,000
Flat - Kaimakli, Nicosia
280,000
Flat - Kaimakli, Nicosia
130,000
Flat - Germasogia, Limassol
410,000
Flat - Germasogeia, Limassol
285,000
Flat - Petrou & Pavlou, Limassol
230,000

Mixing implicit with explicit is not recommended. But in few cases like this where we are using find_element and explicit wait, does not do any harm. Please comment implicit wait line, and run the code. If it fails please uncomment and then try again.

  • Related