I'm trying to extract url's from a ul. But it only gives first li url's
This is how cow code look like.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
import time
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.etsy.com/search/handmade?q=marokaanse azilal vloerkleden&explicit=1&item_type=handmade&ship_to=NL")
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(., 'Accept')]"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")))
time.sleep(2)
urls=driver.find_elements(By.CSS_SELECTOR, "ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")
for row,url in enumerate(urls):
urli=url.find_element(by=By.TAG_NAME,value='a').get_attribute('href')
print(urli)
driver.close()
What is the reason for this?
CodePudding user response:
Your locator is wrong, it returns only one match, instead of that use this:
urls = driver.find_elements(By.CSS_SELECTOR,".wt-grid.wt-grid--block.wt-pl-xs-0.tab-reorder-container a")
for row,url in enumerate(urls):
urli=url.get_attribute('href')
print(row, end = " - ")
print(urli)
Output:
and so on..., there are 48 links printed
CodePudding user response:
If you notice this carefully you will see that all the <li>
items doesn't load instantly: https://gifyu.com/image/S9dUS
99% of all selenium related problem is solved by waiting for the right element to load.
You can see if one or more <li>
has loaded or not. If the length of this list_elems
is greater than 1 continue or you can wait a few seconds.
Try passing an arbritary sleep/delay here, to see if it works:
urls=driver.find_elements(By.CSS_SELECTOR, "ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")
time.sleep(10) # 10 second sleep
for row,url in enumerate(urls):
urli=url.find_element(by=By.TAG_NAME,value='a').get_attribute('href')
print(urli)