Good afternoon all,
Been trying to develop a scrapper for this specific page.
I am trying to extract product title and prices.
Code is the following
from bs4 import BeautifulSoup
import requests
import pandas as pd
import urllib.parse
website = 'https://www.thewhiskyexchange.com/c/339/rum'
response = requests.get(website)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('li',{'product-grid__item'})
If I do "len(results)", I will get a result of 24.
However when actually calling result (results[0]), I only get 1 item returned.
<li ><a href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack"><div ><img alt="Bumbu The Original Rum Glass Pack" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/></div><div ><p > Bumbu The Original Rum<span >Glass Pack</span></p><p > 70cl / 40% </p></div><div ><p > £39.95 </p><p > (£57.07 per litre) </p></div></a></li>
My question is: am I looking at the right class. I tried other classes, but it doesnt seem to work either. Or is there a problem the code?
(I should say I am trying to teach myself how to code, so wouldnt be surprised if something is missing)
CodePudding user response:
Everything is OK. results
is actually a list
data-type variable (what is means there are many results for this search soup.find_all('li',{'product-grid__item'})
), so doing this results[0]
you're accessing first element of the list. You can do : print(results)
to see all elements in results
or use a for loop:
for result in results:
print(result)
CodePudding user response:
Product titles are immediate after []
that's text node. So to get text node value you can call .find(text=True)
method.The same way is to grab price.Now,It's working
from bs4 import BeautifulSoup
import requests
import pandas as pd
import urllib.parse
website = 'https://www.thewhiskyexchange.com/c/339/rum'
response = requests.get(website)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('li',{'product-grid__item'})
for result in results:
title = result.select_one('.product-card__name').find(text=True)
print(title)
try:
price = result.select_one('.product-card__unit-price').find(text=True).replace('(','').replace(')','')
print(price)
except:
pass
Output:
Bumbu The Original Rum
£57.07 per litre
Kraken Black Spiced
£54.64 per litre
Kraken Black Roast Coffee Rum
£38.21 per litre
Doorly's 14 Year Old Rum
£87.79 per litre
Admiral Vernon's Old J Spiced Tiki Fire Rum
£59.93 per litre
Ron Zacapa Centenario Sistema Solera 23 Rum
£78.50 per litre
Old Monk 7 Year Old Rum
£35.64 per litre
Diplomatico Reserva Exclusiva Rum
£64.21 per litre
Pusser's Select Aged 151 Navy Rum
£69.93 per litre
Diplomatico Reserva Exclusiva Rum
£58.50 per litre
El Dorado Rum 15 Year Old
£78.50 per litre
Plantation Extra Old Barbados Rum
£77.50 per litre
Captain Morgan Black Spiced
Doorly's XO Rum
£53.50 per litre
Mount Gay XO Triple Cask Blend
£76.79 per litre
Diplomatico Reserva Exclusiva Rum
£58.50 per litre
Plantation Barbados 5 Year Old Signature Blend Rum
£44.64 per litre
Worthy Park Single Estate Reserve
£69.93 per litre
Pusser's Blue Label British Navy Rum
£39.93 per litre
Ron Zacapa Centenario XO Rum Solera Gran Reserva Especial
£150 per litre
Havana Club 3 Year Old Rum
£30.64 per litre
Santa Teresa 1796 Rum
£74.93 per litre
Eminente Reserva 7 Year Old
£64.93 per litre
Bumbu The Original Rum
£48.21 per litre