Home > Software design >  Extracting nested elements using Selenium
Extracting nested elements using Selenium

Time:11-24

I have a code written using BeautifulSoup, I am currently exploring Selenium, but cannot figure (I hope it is possible) to extract data nested inside some HTML.

This is the bs4 code:

def get_data(link):
    soup1 = getdata(link)
    for one_offer in soup1.find_all('li', {'class': 'clearfix'}):
    # Get sqm:
        raw_sqm = one_offer.find('div', {'class': 'inline-group'})
        get_sqm = raw_sqm.get_text().split(',')[1].split()[0]
        sqm_check_value = if_area_not_speicified(get_sqm)
        sqm_area.append(float(sqm_check_value))

The above code takes in the link: enter image description here

one_offer is one block. From image above that is the red, green and blue rectangle sections. After that for each I get the area indicated with the red arrow from each block and I append them to a list.

How to convert this into Selenium code?

So far I have:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)

driver.get('https://www.imoti.net/bg/obiavi/r/prodava/sofia/?sid=hSrJhL')

variable = []

def testing_values():
    variable.append(driver.find_elements_by_class_name('clearfix'))

testing_values()
print(variable)

The testing_values function returns the following list:

[[<selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="5e3d2712-f453-4871-a43e-8d72d40e6a65")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="53a21fd3-495a-41d4-9382-ae61961209ed")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="56d80ac6-bfaa-48de-9e87-1d2f3c9a42a4")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="74362762-087e-4221-a4b7-cbdf10a16400")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2"]

*this list contains 30 items, however I deleted some of them to keep the example smaller.

So, I have a list containing some sort of web-elements, but how do I extract the data from each one in order to get the area, similar to the code using bs4?

CodePudding user response:

You had an extra div class with class clearfix. So you want to just loop through them and xpath .// and get their text values.

variable = []

def testing_values():
    variable.append([x.find_element_by_xpath(".//div[@class='real-estate-text']/header/div/h3/span[2]").text for x in driver.find_elements_by_xpath("//li[@class='clearfix']")])

testing_values()
print(variable)

Outputs:

[['543 М2', '10 М2', '12 М2', '36 М2', '660 М2', '635 М2', '44 М2', '41 М2', '50 М2', '60 М2', '50 М2', '64 М2', '64 М2', '59 М2', '90 М2', '51 М2', '1053 М2', '72 М2', '66 М2', '78 М2', '65 М2', '52 М2', '75 М2', '68 М2', '62 М2', '72 М2', '90 М2', '78 М2', '74 М2', '57 М2']]
  • Related