Home > Software engineering >  How can you download all non-obvious images with Beautifulsoup/Selenium?
How can you download all non-obvious images with Beautifulsoup/Selenium?

Time:11-30

I'm making next little project to learn - it's what I'm trying to do past few days, without success. I want to make list of opals, their prices...and download their images from website. At end (probably) here are two ways: assign opals to images (in word or excel) or just save images with name of opal price. I succeed at making list of name price (using help I gained from previous topics), but I can't get into images. They don't have .jpg, here is not 'src' and when I 'extracted' example img link, it was wrong, too. Take a look:

/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F352/FF36/99BC/5F60/0A0C/6D0B/F3AC/IMG-8248VOLL_5_ECK.JPG&shop=80300026&width={width}&height=2560

EDIT: 'code edited, too' - I managed to extract just the list of img list I needed.

Below is how img class of this website looks like:

enter image description here

And my code is below - it doesn't download images yet :(

import requests
from bs4 import BeautifulSoup

URL = 'https://www.koroit-opal-company.com/en/c/solid-opals'
page = requests.get(URL)
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"}
soup = BeautifulSoup(requests.get(f'https://www.koroit-opal-company.com/en/c/solid-opals', headers=headers).text, "lxml")
# print(soup.prettify())

names = [n.getText(strip=True) for n in soup.select("div div div a h2")]
# print(names)
prices = [n.getText(strip=True) for n in soup.select("div div div h3")]
# print(prices)

for name, price in zip(names, prices):
    print(f"{name} {price}")

opal_list = soup.find('div', attrs = {'class':'content'})    #gives just fragment of website where opal imgs are
imgs = opal_list.find_all('img')
print(imgs)

example = imgs[0]
x = example.attrs['data-src']
print(x)

result = soup.find_all(lambda tag: tag.name == 'img' and
                       tag.get('class') == ['product-item-image'])

print(result)

CodePudding user response:

you can use the API and all downloaded images ll be next to the executable file:

import requests


def get_info(page: int):
    url = f"https://www.koroit-opal-company.com/api/v2/products?sort=position-asc&resultsPerPage=12&page={page}&categoryId=551DDBD9-8AD4-229C-164A-C0A82AB9D825&locale=en_GB&shop=80300026"
    response = requests.get(url)
    for opal in response.json()['products']:
        img_link = 'https://www.koroit-opal-company.com'   opal['image']['url']
        print(opal['name'], opal['price']['formatted'], img_link)
        download_image(img_link, opal['name'])


def download_image(image_url: str, image_name: str):
    img_data = requests.get(image_url).content
    with open(image_name   '.jpg', 'wb') as handler:
        handler.write(img_data)


get_info(1)

OTPUT for page 1:

1,38 ct Boulder opal 70.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F352/FF36/99BC/5F60/0A0C/6D0B/F3AC/IMG-8248VOLL_5_ECK.JPG&shop=80300026
4,27 ct Black opal 3,300.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/5A8D/6108/2341/EB00/2803/0A0C/6D04/CDD2/IMG-7618VOLL_5_ECK.JPG&shop=80300026
4,52 ct Boulder opal 80.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F4E2/AFCE/F6ED/688E/0A0C/6D0F/3E17/IMG-8110VOLL_5_ECK.JPG&shop=80300026
0,85 ct Boulder opal 70.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F540/D8D2/E5C6/9854/0A0C/6D0B/05BA/IMG-8018VOLL_5_ECK.JPG&shop=80300026
13,14 ct Crystal opal 400.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6286/56CC/ED12/1E08/ECEF/0A0C/6D0B/5AAC/IMG-9213_HOCH_MAI_22.JPG&shop=80300026
5,85 ct Boulder opal 80.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F59C/77B0/F735/05AF/0A0C/6D0F/8E4C/IMG-7990VOLL_5_ECK.JPG&shop=80300026
11,38 ct Boulder opal 2,400.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/594A/94B3/C880/F5AB/848B/C0A8/2AB9/4A53/IMG-8598VOLL_5_ECK.JPG&shop=80300026
2,41 ct Crystal opal 60.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F66A/867E/23D7/BF76/0A0C/6D0B/947B/IMG-8994VOLL_5_ECK.JPG&shop=80300026
6,94 ct Boulder opal 1,600.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/5AB9/3AB7/8F7A/659C/DDE4/0A0C/6D00/F73C/IMG-7672VOLL_5_ECK.JPG&shop=80300026
2,28 ct Crystal opal 70.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F4D7/AEFE/F74C/9A52/0A0C/6D0B/0587/IMG-8045VOLL_5_ECK.JPG&shop=80300026
1,48 ct Crystal opal 70.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6364/F3B9/7E20/2E27/FB71/0A0C/6D0F/C782/IMG-8242VOLL_5_ECK.JPG&shop=80300026
11,29 ct Crystal opal 600.00 € https://www.koroit-opal-company.com/storage/images/image?remote=https://www.koroit-opal-company.com/WebRoot/Store15/Shops/80300026/6335/B93C/9276/3389/B532/0A0C/6D0B/3404/IMG-5671VOLL_5_ECK.JPG&shop=80300026

UPDATE: michalb93 asked how find api link:

  1. Press "Show more" button with open dev tools on network tab
  2. ctrl f and search new opal's name, i search "2,75 ct Crystal opal"
  3. After pressing Enter, u get list of requests which contain this name, u can see it in the attached screenshot. CHROME NETWORK TAB
  • Related