Home > Blockchain >  Trying to get li urls from ul using selenium
Trying to get li urls from ul using selenium

Time:10-13

I'm trying to extract url's from a ul. But it only gives first li url's

This is how cow code look like.

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
import time

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.etsy.com/search/handmade?q=marokaanse azilal vloerkleden&explicit=1&item_type=handmade&ship_to=NL")

WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(., 'Accept')]"))).click()

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")))

time.sleep(2)
urls=driver.find_elements(By.CSS_SELECTOR, "ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")

for row,url in enumerate(urls):
    urli=url.find_element(by=By.TAG_NAME,value='a').get_attribute('href')
    print(urli)

driver.close()

What is the reason for this?

CodePudding user response:

Your locator is wrong, it returns only one match, instead of that use this:

urls = driver.find_elements(By.CSS_SELECTOR,".wt-grid.wt-grid--block.wt-pl-xs-0.tab-reorder-container a")

for row,url in enumerate(urls):
    urli=url.get_attribute('href')
    print(row, end = " - ")
    print(urli)

Output:

0 - https://www.etsy.com/in-en/listing/1303674238/costum-moroccan-white-rug-moroccan?click_key=5450702e9543b87204f8367193a173c86636fa5a:1303674238&click_sum=dd26650b&ga_order=most_relevant&ga_search_type=handmade&ga_view_type=gallery&ga_search_query=marokaanse azilal vloerkleden&ref=search_in_grid-1-1&pro=1&frs=1&sts=1

1 - https://www.etsy.com/in-en/listing/1266669024/custom-fabulous-boujad-rug-authentic?click_key=ffdf555b9352959ef310bc137e76fa55de94e27a:1266669024&click_sum=d57ac2c6&ga_order=most_relevant&ga_search_type=handmade&ga_view_type=gallery&ga_search_query=marokaanse azilal vloerkleden&ref=search_in_grid-1-2&pro=1&frs=1&sts=1

2 - https://www.etsy.com/in-en/listing/1305094860/costum-moroccan-white-rug-moroccan?click_key=9b42c7a0d01cb3141220f82ca1e6671afd9bfa69:1305094860&click_sum=02d5fe15&ga_order=most_relevant&ga_search_type=handmade&ga_view_type=gallery&ga_search_query=marokaanse azilal vloerkleden&ref=search_in_grid-1-3&pro=1&frs=1&sts=1

3 - https://www.etsy.com/in-en/listing/1321859593/costum-moroccan-colorful-rug-moroccan?click_key=d39a5439ca92cfa27bc3427c9dd9a8a2ad0fc0bd:1321859593&click_sum=d6dea316&ga_order=most_relevant&ga_search_type=handmade&ga_view_type=gallery&ga_search_query=marokaanse azilal vloerkleden&ref=search_in_grid-1-4&pro=1&frs=1&sts=1

and so on..., there are 48 links printed

CodePudding user response:

If you notice this carefully you will see that all the <li> items doesn't load instantly: https://gifyu.com/image/S9dUS

99% of all selenium related problem is solved by waiting for the right element to load.

You can see if one or more <li> has loaded or not. If the length of this list_elems is greater than 1 continue or you can wait a few seconds.

Try passing an arbritary sleep/delay here, to see if it works:

urls=driver.find_elements(By.CSS_SELECTOR, "ul[class='wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container']")

time.sleep(10) # 10 second sleep

for row,url in enumerate(urls):
    urli=url.find_element(by=By.TAG_NAME,value='a').get_attribute('href')
    print(urli)
  • Related