Home > Back-end >  How do I scrape just one specific image using Python Selenium?
How do I scrape just one specific image using Python Selenium?

Time:11-09

I would like to scrape an image from a website and store it in a specified folder but all the tutorials out there only seem to teach how to scrape multiple images. For example, I would like to scrape this puppy image that can be seen right away from https://duckduckgo.com/?q=Puppy&t=h_&ia=web and save it on my desktop. How do I go on about this?

The codes that I have only figured out so far is:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

PATH = "C:\Coding\Codes\Python\edgedriver\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://duckduckgo.com/")

searchbox = driver.find_element_by_id("search_form_input_homepage")
searchbox.send_keys("Puppy")
searchbox.send_keys(Keys.ENTER)

#then save the puppy's image to a specified folder, say inside C:\Users\John\Desktop

CodePudding user response:

To scrape the value of the src attribute of the only image, you can use either of the following Locator Strategies:

  • Using css_selector:

    print(driver.find_element(By.CSS_SELECTOR, "a.module__image>img").get_attribute("src"))
    
  • Using xpath:

    print(driver.find_element(By.XPATH, "//a[@class='module__image']/img").get_attribute("src"))
    

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.module__image>img"))).get_attribute("src"))
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='module__image']/img"))).get_attribute("src"))
    
  • Console Output:

    https://duckduckgo.com/i/a49fa21e.jpg
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

You can use urllib.request library

import urllib.request
from random import *
import random,string

sampleImage = driver.find_element_by_xpath('your xpath').get_attribute('src')
characters = 5
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname = str(img_str)   '.jpg'
filepath = 'E:\\crawling\\IMG\\'   fullname
urllib.request.urlretrieve(sampleImage,filepath)
print(fullname)

I hope this will work out. I use random library for naming the image with random characters.

Here is the code if you want to loop over images

import urllib.request
from random import *
import random,string

j=1
imagename=[]
for images in driver.find_elements_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr'):
        sampleImage[j] = driver.find_element_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr[%d]/td[1]/img' % (j,)).get_attribute('src')
        print(sampleImage[j])
        characters = 10
        letters = string.ascii_lowercase
        img_str = ''.join(random.choice(letters) for i in range(characters))
        fullname[j] = str(img_str)   '.jpg'
        filepath[j] = 'E:\\crawling\\IMG-FARAH\\'   fullname[j]
        urllib.request.urlretrieve(sampleImage[j],filepath[j])
        imagename.append(fullname[j])
        print(fullname[j])
        j=j 1   

I've also added the sample xpath and variable which would update after each count

  • Related