I am trying to get the price of one item on the website in the url below. However, I am finding some issues when looking at the source page of the website.
The url is: https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love
The part of the source page I am interested in is the following (I guess):
<script type="application/ld json">
[{
"@context":"http://schema.org",
"@type":"Product",
"productID":"25372685655708131",
"name":"LOVE bracelet, small model",
"description":"#LOVE# bracelet, small model, yellow gold 750/1000. Supplied with a screwdriver. Width: 3.65 mm (for size 17). Now available in a slimmer version, Cartier continues to write the story of the #LOVE# bracelet. Same design, same oval shape, same story: a timeless – yet slightly slimmer – creation which is fastened using a screwdriver. The closure is designed with a functional screw on one side of the bracelet and a hinge on the other. To determine the size of your #LOVE# bracelet, measure your wrist, adding one centimetre to your size for a tighter fit, or two centimetres for a looser fit.",
"image":["https://www.cartier.com/variants/images/25372685655708131/img1/w960.jpg"],
"offers":
[{"@type":"Offer","availability":"http://schema.org/InStock","priceCurrency":"GBP","price":"4100","sku":"0400574782829","url":"https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html"}]}]
</script>
I have tried the following steps:
import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd
data = {'url':[],'offers_price':[]}
def get_price(url):
soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
data = json.loads(soup.find_all('script', {'type': 'application/ld json'})[-1].get_text())
return url, int(data['offers']['price'])
if __name__ == '__main__':
urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']
with Pool(processes=4) as pool:
for url, price in pool.imap_unordered(get_price, urls):
data['offers_price'].append(price)
data['url'].append(url)
print(data)
But not successful. How would you approach in this case?
CodePudding user response:
I was able to get the price, but I got it from the product-price
tag:
import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd
data = {'url':[],'offers_price':[]}
def get_price(url):
soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
data = json.loads(soup.find_all('product-price')[-1]['data-model'])
return url, int(data['fullPrice'])
if __name__ == '__main__':
urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']
with Pool(processes=4) as pool:
for url, price in pool.imap_unordered(get_price, urls):
data['offers_price'].append(price)
data['url'].append(url)
print(data)
Output:
{'url': ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love'], 'offers_price': [4100]}
By the way, are you sure you want to append the url and the price? I think you should do this instead:
data['offers_price'] = price
data['url'] = url