How do I scrape just one specific image using Python Selenium?-CodePudding

I would like to scrape an image from a website and store it in a specified folder but all the tutorials out there only seem to teach how to scrape multiple images. For example, I would like to scrape this puppy image that can be seen right away from https://duckduckgo.com/?q=Puppy&t=h_&ia=web and save it on my desktop. How do I go on about this?

The codes that I have only figured out so far is:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

PATH = "C:\Coding\Codes\Python\edgedriver\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://duckduckgo.com/")

searchbox = driver.find_element_by_id("search_form_input_homepage")
searchbox.send_keys("Puppy")
searchbox.send_keys(Keys.ENTER)

#then save the puppy's image to a specified folder, say inside C:\Users\John\Desktop

CodePudding user response：

To scrape the value of the src attribute of the only image, you can use either of the following Locator Strategies:

Using css_selector:

print(driver.find_element(By.CSS_SELECTOR, "a.module__image>img").get_attribute("src"))

Using xpath:

print(driver.find_element(By.XPATH, "//a[@class='module__image']/img").get_attribute("src"))

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.module__image>img"))).get_attribute("src"))

Using XPATH:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='module__image']/img"))).get_attribute("src"))

Console Output:
```
https://duckduckgo.com/i/a49fa21e.jpg
```

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response：

You can use urllib.request library

import urllib.request
from random import *
import random,string

sampleImage = driver.find_element_by_xpath('your xpath').get_attribute('src')
characters = 5
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname = str(img_str)   '.jpg'
filepath = 'E:\\crawling\\IMG\\'   fullname
urllib.request.urlretrieve(sampleImage,filepath)
print(fullname)

I hope this will work out. I use random library for naming the image with random characters.

Here is the code if you want to loop over images

import urllib.request
from random import *
import random,string

j=1
imagename=[]
for images in driver.find_elements_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr'):
        sampleImage[j] = driver.find_element_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr[%d]/td[1]/img' % (j,)).get_attribute('src')
        print(sampleImage[j])
        characters = 10
        letters = string.ascii_lowercase
        img_str = ''.join(random.choice(letters) for i in range(characters))
        fullname[j] = str(img_str)   '.jpg'
        filepath[j] = 'E:\\crawling\\IMG-FARAH\\'   fullname[j]
        urllib.request.urlretrieve(sampleImage[j],filepath[j])
        imagename.append(fullname[j])
        print(fullname[j])
        j=j 1

I've also added the sample xpath and variable which would update after each count