I'm trying to scrape data from Vivino, and so far I managed to use the API and read from the json file using this post:
CodePudding user response:
The data is located in the javascript under window.__PRELOADED_STATE__.winePageInformation
object like this:
<script>
window.__PRELOADED_STATE__ = ....
window.__PRELOADED_STATE__.winePageInformation = { very long JSON here }
</script>
You can use a regex to extract it, and the result seems to be valid JSON:
import requests
import re
import json
url = "https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846"
r = requests.get(url,
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
})
# this gets the javascript object
res = re.search(r"^.*window\.__PRELOADED_STATE__\.winePageInformation\s*=\s*([^;]*);", r.text, re.DOTALL)
data = json.loads(res.group(1))
print("recommended vintages")
print(data["recommended_vintages"])
print("all vintages")
print(data["wine"]["vintages"])
CodePudding user response:
Bertrand has given you the best answer. Perhaps oddly, the endpoint you are hitting is not configured for allowing you to pass in a wine id and get all vintages back. The available params are:
country_code, country_codes, currency_code, discount_prices, food_ids,
grape_ids, grape_filter, max_rating, merchant_id, merchant_type, min_rating,
min_ratings_count, order_by, order, page, per_page, price_range_max,
price_range_min, region_ids, wine_style_ids, wine_type_ids, winery_ids,
vintage_ids, wine_years, excluding_vintage_id, wsa_year, top_list_filter
These are detailed in the JS file https://www.vivino.com/packs/common-8f26f13b0ac53f391471.js
.
You would need to determine the vintage ids and pass those in to the API instead.