Home > database >  Web scraping pricerunner.dk products using python?
Web scraping pricerunner.dk products using python?

Time:07-21

I am trying to scrape product data (mainly URL, Productname and EAN) from pricerunner.dk. To be more specific i need to scrape https://www.pricerunner.dk/cl/1424/OEl-Spiritus, and https://www.pricerunner.dk/cl/465/Vin. I want to scrape all products on these urls and put them in an excel sheet.

This is what i got so far but it isn't working. I tried looking if there was a json file or API url for fetching the products, but couldn't find it. Also I can't find the EAN in the inspector for some reason. Any help would be greatly appreciated.

    excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'SpiritsUrlsPricerunner'
#sheet.append(['productnaam', 'URL'])

url = 'https://www.pricerunner.dk/cl/1424/OEl-Spiritus'

# User-agent omwisselen van pythoncrawler naar eigen browser

windowsheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"}
firefoxheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0"}
macheader = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36"}
chromeheader = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"}
safariheader = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15"}
linuxheader = {"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0"}
chromelinuxheader = {"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36"}

rotateheaders = [windowsheader, firefoxheader, macheader, chromeheader, safariheader, linuxheader, chromelinuxheader]

# Functie om tekst uit een HTML element te halen

def getTextFromHTMLItem(HTMLItem):
    try:
        return HTMLItem.text
    except:
        return " "

# Fucntie om href uit HTML element te halen

def getHREFFromHTMLItem(HTMLItem):
    try:
        return 'https://www.pricerunner.dk'   HTMLItem['href']
    except:
        return " "

# Functie dat een pagina opent en de HTML teruggeeft

def getdata(url):
    try:
        headers = random.choice(rotateheaders)
        source = requests.get(url, headers=headers)
        source.raise_for_status()
        soup = BeautifulSoup(source.text,'html.parser')
        wachttijd = random.randint(0, 1)
        print("Succes! URL:", url, "Wachttijd is:", wachttijd, "seconden")

        # Info uit HTML halen

        productlist = soup.find('div', {'class':'mIkxpLfxgo css-183umi2'}).find_all('div', {'class':'al5wsmjlcK'})
        for productinfo in productlist:
            productnaam = getTextFromHTMLItem(productinfo.find('h3', {'class':'pUoKQGvtG9 sQ60lfZFoA nsNMYyHYau css-1rr2efs'}))
            product_url = getHREFFromHTMLItem(productinfo.find('a'))

        # Informatie printen
            print(productlist)
            print(productnaam, product_url)

        # Informatie in sheet row plaatsen

            #print("Sheet append")
            #sheet.append([product_url])
            #time.sleep(1)


        time.sleep(wachttijd)
        print("Sheet opslaan")
        excel.save('C:/Python/Files/SpiritsUrlsPricerunner.xlsx')
        return soup

    except Exception as e:
        wachttijd = random.randint(0, 1)
        print("Faal! URL:", url, "Wachttijd is:", wachttijd, "seconden")
        time.sleep(wachttijd)

CodePudding user response:

Data is loaded dynamically by JS using API. You can grab all the desired data from API.

Example:

    import requests
    api_url = 'https://www.pricerunner.dk/public/search/category/categoryoffers/dk/1424?size=48&offset=64&af_56525176=58381603&sorting=RANK_asc'
    
    req = requests.get(api_url).json()['categoryOffers']
    
    for item in req:
            merchant=item['merchant']['name']
            url='https://www.pricerunner.dk' item['url']
    
            print(merchant,url)

Output:

Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/6ff0a6c3ff8ea75861804f72797c54ad
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/e1cdabfc359f04486cbbd1f5fff34ab9
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/f5b876d97b6e5f4e1b8b3dc61c044ecb       
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/7782e3c9deae9bc3977140c36306be17       
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/3d7dcd1eb622b75fa56edea04125041b       
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/9f0f1c73726e09fd675fcdcd9fd9ee7f       
Laudrup Vin https://www.pricerunner.dk/gotostore/v1/DK/f3cacede63e7dbf2c8ffbed5f8037802       
YourSurprise https://www.pricerunner.dk/gotostore/v1/DK/e40973d63a5d90b6bb75290630d8d29d      
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/b20fd587987c31f6c2bd08b17987b431
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/797b5b7524f6e1ba2824f383917016bb
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/d112996544ffc1fb1f30bad6f3bb2261
Design Og Handelshuset https://www.pricerunner.dk/gotostore/v1/DK/7e59abe376c02a7a465ccafdb016a053
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/0a7183fc651da4dc07d7fff10f6a98ba     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/532305307a8066f86caaa99c24b549a3     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/d8417301dfd8b590e65edb978652d468     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/834294eceaff8f3dc501b1359c5d4b48     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/a6666dab855973ffa2477245757827ad     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/dbba0214de986e71d51c08b8b6e043fb     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/2998e41ebab27dd5f0df3d680ed14070     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/bf8dce34808530ee4bdb4a2a71565ca5     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/c08f295fcf423bc3d1eebfe40ae34b3c     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/6bdb4583345d26902f877964196b8db7     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/d9c3a27293da7b91428738b76d896910     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/01895f938f198e8faa2b3f2f4b60128c     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/a2e13a69baf0b77290ec44d28007c954     
VildMedVin.dk https://www.pricerunner.dk/gotostore/v1/DK/0c55005ffc12644a8460f52ade8939ca     
2010 Vin & Velsmag https://www.pricerunner.dk/gotostore/v1/DK/509c72e4571fc4dc73e17617167ffe2aFalkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/ceed3c287329b3a92806fe0f7a5798d3    
Falkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/c2cb648051eab36587235750db22b332    
Falkensten Vin https://www.pricerunner.dk/gotostore/v1/DK/a8b2c2e49abd3840562903af2c22253d    
Waku Waku https://www.pricerunner.dk/gotostore/v1/DK/68947010e457733c29fea67d3c4b80a1
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/b792cdc84af1171d15894590699f91fd
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/a42b5737d0a269382cae3495e1723590
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/bd76a94a6566a0128a494f30ef530661
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/74cb288053fd9aded7dfc7263794e230       
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/1fe103f228f8680560322c0daa276af8       
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/264b239036d79d727a0a3c0649e9e380
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/da6c8c3f2b612032f7288aa9f0930b65       
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/21dd79509ff4859f754df99377c9bef7
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/619b06827d891f914fa857eb1027e338
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/3de15df38e2f553e2b2d6ef0afb8dfa0
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/2b3426ba0afd10371d77b26abdec3bce
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/737059c3ae153a6ca36349bf31df256c
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/5e094ba7222b956865aa86213323f163
Ginbutikken https://www.pricerunner.dk/gotostore/v1/DK/a3245bb55f49d11c1687ad335f613351       
Bevco.dk https://www.pricerunner.dk/gotostore/v1/DK/cfaeb7678a99c30dfa1a4654a83a3497
Pandasia https://www.pricerunner.dk/gotostore/v1/DK/174c25fdb1be0717a85e0c5336950e10
Uhrskov Vine https://www.pricerunner.dk/gotostore/v1/DK/5efe765b951042caa5ab37daf56af172      
  • Related