Home > other >  Scraping the "Compare Vintages" from Vivino
Scraping the "Compare Vintages" from Vivino

Time:10-17

I'm trying to scrape data from Vivino, and so far I managed to use the API and read from the json file using this post: enter image description here

CodePudding user response:

The data is located in the javascript under window.__PRELOADED_STATE__.winePageInformation object like this:

<script>
  window.__PRELOADED_STATE__ = ....
  window.__PRELOADED_STATE__.winePageInformation = { very long JSON here }
</script>

You can use a regex to extract it, and the result seems to be valid JSON:

import requests
import re
import json

url = "https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846"
r = requests.get(url,
headers= {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
})
# this gets the javascript object
res = re.search(r"^.*window\.__PRELOADED_STATE__\.winePageInformation\s*=\s*([^;]*);", r.text, re.DOTALL)
data = json.loads(res.group(1))

print("recommended vintages")
print(data["recommended_vintages"])

print("all vintages")
print(data["wine"]["vintages"])

CodePudding user response:

Bertrand has given you the best answer. Perhaps oddly, the endpoint you are hitting is not configured for allowing you to pass in a wine id and get all vintages back. The available params are:

country_code, country_codes, currency_code, discount_prices, food_ids, 
grape_ids, grape_filter, max_rating, merchant_id, merchant_type, min_rating,
min_ratings_count, order_by, order, page, per_page, price_range_max, 
price_range_min, region_ids, wine_style_ids, wine_type_ids, winery_ids, 
vintage_ids, wine_years, excluding_vintage_id, wsa_year, top_list_filter

These are detailed in the JS file https://www.vivino.com/packs/common-8f26f13b0ac53f391471.js.

You would need to determine the vintage ids and pass those in to the API instead.

  • Related