I'm new to Python and need some help. I am trying to scrape the image urls from this site but can't seems to do so. I pull up all the html. Here is my code.
import requests
import pandas as pd
import urllib.parse
from bs4 import BeautifulSoup
import csv
baseurl = ('https://www.thewhiskyexchange.com/')
productlinks = []
for x in range(1,4):
r = requests.get(f'https://www.thewhiskyexchange.com/c/316/campbeltown-single-malt-scotch-whisky?pg={x}')
soup = BeautifulSoup(r.content, 'html.parser')
tag = soup.find_all('ul',{'class':'product-grid__list'})
for items in tag:
for link in items.find_all('a', href=True):
productlinks.append(baseurl link['href'])
#print(len(productlinks))
for items in productlinks:
r = requests.get(items)
soup = BeautifulSoup(r.content, 'html.parser')
name = soup.find('h1', class_='product-main__name').text.strip()
price = soup.find('p', class_='product-action__price').text.strip()
imgurl = soup.find('div', class_='product-main__image-container')
print(imgurl)
And here is the piece of HTML I am trying to scrape from.
<div ><img src="https://img.thewhiskyexchange.com/480/gstob.non1.jpg" alt="Glen Scotia Double Cask Sherry Finish" loading="lazy" width="3" height="4">
I would appreicate any help. Thanks
CodePudding user response:
You need to first select the image then get the src attribute. Try this:
imgurl = soup.find('div', class_='product-main__image-container').find('img')['src']
CodePudding user response:
I'm not sure if I fully understand what output you are looking for. But if you just want the img source URLs, this might work:
# imgurl = soup.find('div', class_='product-main__image-container')
imgurl = soup.find('img', class_='product-main__image')
imgurl_attribute = imgurl['src']
print(imgurl_attribute[:5])
#https://img.thewhiskyexchange.com/900/gstob.non1.jpg
#https://img.thewhiskyexchange.com/900/gstob.15yov1.jpg
#https://img.thewhiskyexchange.com/900/gstob.18yov1.jpg
#https://img.thewhiskyexchange.com/900/gstob.25yo.jpg
#https://img.thewhiskyexchange.com/900/sets_gst1.jpg