I am trying to scrape product data (mainly URL, Productname and EAN) from pricerunner.dk. To be more specific i need to scrape https://www.pricerunner.dk/cl/1424/OEl-Spiritus, and https://www.pricerunner.dk/cl/465/Vin. I want to scrape all products on these urls and put them in an excel sheet.
This is what i got so far but it isn't working. I tried looking if there was a json file or API url for fetching the products, but couldn't find it. Also I can't find the EAN in the inspector for some reason. Any help would be greatly appreciated.
excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'SpiritsUrlsPricerunner'
#sheet.append(['productnaam', 'URL'])
url = 'https://www.pricerunner.dk/cl/1424/OEl-Spiritus'
# User-agent omwisselen van pythoncrawler naar eigen browser
windowsheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"}
firefoxheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0"}
macheader = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36"}
chromeheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"}
safariheader = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15"}
linuxheader = {"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0"}
chromelinuxheader = {"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36"}
rotateheaders = [windowsheader, firefoxheader, macheader, chromeheader, safariheader, linuxheader, chromelinuxheader]
# Functie om tekst uit een HTML element te halen
def getTextFromHTMLItem(HTMLItem):
try:
return HTMLItem.text
except:
return " "
# Fucntie om href uit HTML element te halen
def getHREFFromHTMLItem(HTMLItem):
try:
return 'https://www.pricerunner.dk' HTMLItem['href']
except:
return " "
# Functie dat een pagina opent en de HTML teruggeeft
def getdata(url):
try:
headers = random.choice(rotateheaders)
source = requests.get(url, headers=headers)
source.raise_for_status()
soup = BeautifulSoup(source.text,'html.parser')
wachttijd = random.randint(0, 1)
print("Succes! URL:", url, "Wachttijd is:", wachttijd, "seconden")
# Info uit HTML halen
productlist = soup.find('div', {'class':'mIkxpLfxgo css-183umi2'}).find_all('div', {'class':'al5wsmjlcK'})
for productinfo in productlist:
productnaam = getTextFromHTMLItem(productinfo.find('h3', {'class':'pUoKQGvtG9 sQ60lfZFoA nsNMYyHYau css-1rr2efs'}))
product_url = getHREFFromHTMLItem(productinfo.find('a'))
# Informatie printen
print(productlist)
print(productnaam, product_url)
# Informatie in sheet row plaatsen
#print("Sheet append")
#sheet.append([product_url])
#time.sleep(1)
time.sleep(wachttijd)
print("Sheet opslaan")
excel.save('C:/Python/Files/SpiritsUrlsPricerunner.xlsx')
return soup
except Exception as e:
wachttijd = random.randint(0, 1)
print("Faal! URL:", url, "Wachttijd is:", wachttijd, "seconden")
time.sleep(wachttijd)
CodePudding user response:
Data is loaded dynamically by JS using API. You can grab all the desired data from API.
Example:
import requests
api_url = 'https://www.pricerunner.dk/public/search/category/categoryoffers/dk/1424?size=48&offset=64&af_56525176=58381603&sorting=RANK_asc'
req = requests.get(api_url).json()['categoryOffers']
for item in req:
merchant=item['merchant']['name']
url='https://www.pricerunner.dk' item['url']
print(merchant,url)
Output:
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/6ff0a6c3ff8ea75861804f72797c54ad
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/e1cdabfc359f04486cbbd1f5fff34ab9
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/f5b876d97b6e5f4e1b8b3dc61c044ecb
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/7782e3c9deae9bc3977140c36306be17
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/3d7dcd1eb622b75fa56edea04125041b
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/9f0f1c73726e09fd675fcdcd9fd9ee7f
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/f3cacede63e7dbf2c8ffbed5f8037802
YourSurprise https://www.pricerunner.dk/gotostore/v1/DK/e40973d63a5d90b6bb75290630d8d29d
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/b20fd587987c31f6c2bd08b17987b431
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/797b5b7524f6e1ba2824f383917016bb
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/d112996544ffc1fb1f30bad6f3bb2261
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/7e59abe376c02a7a465ccafdb016a053
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/0a7183fc651da4dc07d7fff10f6a98ba
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/532305307a8066f86caaa99c24b549a3
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/d8417301dfd8b590e65edb978652d468
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/834294eceaff8f3dc501b1359c5d4b48
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/a6666dab855973ffa2477245757827ad
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/dbba0214de986e71d51c08b8b6e043fb
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/2998e41ebab27dd5f0df3d680ed14070
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/bf8dce34808530ee4bdb4a2a71565ca5
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/c08f295fcf423bc3d1eebfe40ae34b3c
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/6bdb4583345d26902f877964196b8db7
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/d9c3a27293da7b91428738b76d896910
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/01895f938f198e8faa2b3f2f4b60128c
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/a2e13a69baf0b77290ec44d28007c954
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/0c55005ffc12644a8460f52ade8939ca
2010 Vin & Velsmag https://www.pricerunner.dk/gotostore/v1/DK/509c72e4571fc4dc73e17617167ffe2aFalkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/ceed3c287329b3a92806fe0f7a5798d3
Falkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/c2cb648051eab36587235750db22b332
Falkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/a8b2c2e49abd3840562903af2c22253d
Waku Waku https://www.pricerunner.dk/gotostore/v1/DK/68947010e457733c29fea67d3c4b80a1
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/b792cdc84af1171d15894590699f91fd
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/a42b5737d0a269382cae3495e1723590
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/bd76a94a6566a0128a494f30ef530661
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/74cb288053fd9aded7dfc7263794e230
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/1fe103f228f8680560322c0daa276af8
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/264b239036d79d727a0a3c0649e9e380
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/da6c8c3f2b612032f7288aa9f0930b65
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/21dd79509ff4859f754df99377c9bef7
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/619b06827d891f914fa857eb1027e338
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/3de15df38e2f553e2b2d6ef0afb8dfa0
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/2b3426ba0afd10371d77b26abdec3bce
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/737059c3ae153a6ca36349bf31df256c
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/5e094ba7222b956865aa86213323f163
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/a3245bb55f49d11c1687ad335f613351
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/cfaeb7678a99c30dfa1a4654a83a3497
Pandasia https://www.pricerunner.dk/gotostore/v1/DK/174c25fdb1be0717a85e0c5336950e10
Uhrskov Vine https://www.pricerunner.dk/gotostore/v1/DK/5efe765b951042caa5ab37daf56af172