I would like to scrape an image from a website and store it in a specified folder but all the tutorials out there only seem to teach how to scrape multiple images. For example, I would like to scrape this puppy image that can be seen right away from https://duckduckgo.com/?q=Puppy&t=h_&ia=web and save it on my desktop. How do I go on about this?
The codes that I have only figured out so far is:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
PATH = "C:\Coding\Codes\Python\edgedriver\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://duckduckgo.com/")
searchbox = driver.find_element_by_id("search_form_input_homepage")
searchbox.send_keys("Puppy")
searchbox.send_keys(Keys.ENTER)
#then save the puppy's image to a specified folder, say inside C:\Users\John\Desktop
CodePudding user response:
To scrape the value of the src attribute of the only image, you can use either of the following Locator Strategies:
Using
css_selector
:print(driver.find_element(By.CSS_SELECTOR, "a.module__image>img").get_attribute("src"))
Using
xpath
:print(driver.find_element(By.XPATH, "//a[@class='module__image']/img").get_attribute("src"))
Ideally you need to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
Using
CSS_SELECTOR
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.module__image>img"))).get_attribute("src"))
Using
XPATH
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='module__image']/img"))).get_attribute("src"))
Console Output:
https://duckduckgo.com/i/a49fa21e.jpg
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
You can use urllib.request library
import urllib.request
from random import *
import random,string
sampleImage = driver.find_element_by_xpath('your xpath').get_attribute('src')
characters = 5
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname = str(img_str) '.jpg'
filepath = 'E:\\crawling\\IMG\\' fullname
urllib.request.urlretrieve(sampleImage,filepath)
print(fullname)
I hope this will work out. I use random library for naming the image with random characters.
Here is the code if you want to loop over images
import urllib.request
from random import *
import random,string
j=1
imagename=[]
for images in driver.find_elements_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr'):
sampleImage[j] = driver.find_element_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr[%d]/td[1]/img' % (j,)).get_attribute('src')
print(sampleImage[j])
characters = 10
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname[j] = str(img_str) '.jpg'
filepath[j] = 'E:\\crawling\\IMG-FARAH\\' fullname[j]
urllib.request.urlretrieve(sampleImage[j],filepath[j])
imagename.append(fullname[j])
print(fullname[j])
j=j 1
I've also added the sample xpath and variable which would update after each count