Home > other >  python - extract img src address from selenium div element
python - extract img src address from selenium div element

Time:08-21

We are working on extracting the img src address from the page.

  <div >
    <div >
    <div ><img src="https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg" alt="product"><div><button type="button" >
    </div></div>
    <div >
    <div >
    <img src="https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg" alt="product"><div></div></div>

I tried to get it in the following way

    ele = driver.find_elements_by_xpath("//div[@class='product-picture']/img")
    print(ele)

    >>>
    <selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="27ef8c33-624d-4166-9dc7-3a355c4dcc32")>
    <selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="a6d77107-fecf-4c84-a048-9b4bda39b9df")>
    <selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="1f62cb8b-df58-4f06-afe6-6c60cb572527")>

What I want is the img src address string of every element on the page. Is there a way to extract a string?

CodePudding user response:

from selenium.webdriver.common.by import By

images = driver.find_elements(By.XPATH, "//div[@class='product-picture']/img")
for img in images:
    print(img.get_attribute("src"))

This will give you the expected output :

https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg"
https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg

CodePudding user response:

Try to use get_attribute('src') method to grab the src value

ele = driver.find_elements_by_xpath("//div[@class='product-picture']/img").get_attribute('src')

CodePudding user response:

You are using deprecated syntax. Please see python selenium DeprecationWarning: find_element_by_* commands are deprecated

The optimal way of locating elements which are likely to be lazy loading would be:

images = WebDriverWait(browser, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='product-picture']/img")))
for i in images:
    print(i.get_attribute('src')

You will also need the following imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Selenium docs can be found at https://www.selenium.dev/documentation/

  • Related