I tried to fetch all special div tag with a class of "someClass" from a website
Website need to scroll down to load new div elements so I used Keys.PAGE_DOWN, that worked and scrolled but the data wasn't complete again
So I used:
elem = driver.find_element(By.TAG_NAME, "body")
no_of_pagedowns = 23
while no_of_pagedowns:
elem.send_keys(Keys.PAGE_DOWN)
time.sleep(0.7)
no_of_pagedowns-=1
It will Scroll till the entire html page load
but when I want to write data in a file, it just write 20 div tag instead of hundred ...
Complete Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
url = 'https://divar.ir/s/tehran/buy-apartment/parand?price=200000000-450000000&non-negotiable=true&has-photo=true&q=خانه پرند'
driver.get(url)
elem = driver.find_element(By.TAG_NAME, "body")
no_of_pagedowns = 23
while no_of_pagedowns:
elem.send_keys(Keys.PAGE_DOWN)
time.sleep(0.3)
no_of_pagedowns-=1
datas = driver.find_elements(By.CLASS_NAME, 'kt-post-card__body')
f = open('data.txt', 'w')
counter = 1
for data in range(len(datas)):
f.write(f'{counter}--> {datas[data].text}')
counter = 1
f.write('\n')
f.close()
driver.quit()
CodePudding user response:
To select only 20 <div>
tag instead of hundreds you can use list slicing and you can use either of the following locator strategies:
Using CSS_SELECTOR
elements = driver.find_elements(By.CSS_SELECTOR, "div.kt-post-card__body")[:20]
Using XPATH:
elements = driver.find_elements(By.XPATH, "//div[@class='kt-post-card__body']")[:20]
Ideally you have to induce WebDriverWait for the visibility_of_all_elements_located()
and you can use either of the following locator strategies:
Using CSS_SELECTOR
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.kt-post-card__body")))[:20]
Using XPATH:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kt-post-card__body']")))[:20]
Update
To select all the <div>
s:
To select all the <div>
s you can use list slicing and you can use either of the following locator strategies:
Using CSS_SELECTOR
elements = driver.find_elements(By.CSS_SELECTOR, "div.kt-post-card__body")
Using XPATH:
elements = driver.find_elements(By.XPATH, "//div[@class='kt-post-card__body']")
CodePudding user response:
I checked the site and as I figured out they get data as json by using api and cursor. Cursor here made with a time expression and the variable called last-post-date. When entered to the site this value is given as lastPostDate inside a json. To obtain data fastly from the site this requests can be used: https://divar.ir/s/tehran/buy-apartment/parand?price=200000000-450000000&non-negotiable=true&has-photo=true&q=خانه پرند lastPostDate value should be taken from this link and lastPostDate value in the JSON below should be updated with it.
{"json_schema":{"category":{"value":"apartment-sell"},"districts":{"vacancies":["427"]},"price":{"max":450000000,"min":200000000},"non-negotiable":true,"has-photo":true,"query":"خانه پرند"},"last-post-date":1647005920188580}
This updated JSON should be sent to API link below as POST. https://api.divar.ir/v8/search/1/apartment-sell
And a new JSON should be returned. And inside this JSON there is a “last_post_date”. New queries could be made by using this variable. Also required data is stored in this JSON.
This is just an idea. It seems to be working when I test it with POSTMAN.