Home > Back-end >  How to scrape images from slider/slideshow?
How to scrape images from slider/slideshow?

Time:03-01

So there is this e-commerce page https://www.jooraccess.com/r/products?token=feba69103f6c9789270a1412954cf250 and there are hundreds of products, and for each product there is a slider with images (or slideshow or whatever you call it). I just need to scrape all the images from the page. I understand how to grab first images in each slider, I just can't figure out how to scrape the rest of the images in each slider.

I have inspected the element and noticed that each time I change the image in the slider, this part

<div data-position="4" ></div> 

moves down these positions (in the example below image#4 is selected)

<div  data-testid="breadcrumbContainer">
    <div data-position="0" ></div>
    <div data-position="1" ></div>
    <div data-position="2" ></div>
    <div data-position="3" ></div>
    <div data-position="4" ></div>
    <div data-position="5" ></div>
</div>

CodePudding user response:

You can not collect all those images automatically.
Only 1 image per product is presented and exists on the page each time.
In order to change the image / load another image you have to click on thumbnails radio buttons below each product. This causes some JS to load another image for that product.
In other words, the other, not displayed images, are not existing on the page until they loaded by clicking on the radio buttons - thumbnails below each products.

CodePudding user response:

To scrape all the values of the src attributes from the first slide you need to:

  • Click on each slide inducing WebDriverWait for the element_to_be_clickable()

  • Collect the value of each src attribute inducing WebDriverWait for the visibility_of_element_located()

  • You can use the following locator strategies:

    driver.get("https://www.jooraccess.com/r/products?token=feba69103f6c9789270a1412954cf250")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='1']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='2']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='3']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='3']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    
  • Console Output:

    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Sundays_NYC_3202 (1).jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Sundays_NYC_3207.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya dress_Floral03.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya dress_Floral04.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya dress_Floral05.jpg
    
  • Related