I have the below code that works fully up until I set x=37
. At this point, I receive the error
TypeError: 'NoneType' object is not subscriptable on the variable t["vintage"]["wine"]["region"]["country"]["name"].
I have added another variable that the same issue happens on almost everytime, so you may find the error there instead.
I think this is because one of the 25 results on that page does not have a country name assigned to it and therefore the variable is giving an error.
I think I need to add an exception to each variable to handle where this is the case. I have seen examples of adding these except, they seem to be at the level of the request not finding a legitimate page rather than one of the variables and I can't find guidance to add them at the variable level.
# Import packages
import requests
import json
import pandas as pd
import time
x=37
# Get request from the Vivino website
r = requests.get(
"https://www.vivino.com/api/explore/explore",
params={
#"country_code": "FR",
#"country_codes[]":"pt",
"currency_code":"GBP",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": x,
"price_range_max":"100",
"price_range_min":"25",
"wine_type_ids[]":"1"
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
},
)
# Variables to scrape from the Vivino website
results = [
(
t["vintage"]["wine"]["winery"]["name"],
t["vintage"]["year"],
t["vintage"]["wine"]["id"],
t["vintage"]["wine"]["name"],
t["vintage"]["statistics"]["ratings_average"],
t["prices"][0]["amount"],
t["vintage"]["wine"]["region"]["country"]["name"],
t["vintage"]["wine"]["region"]["country"]["code"],
t["vintage"]["wine"]["region"]["name"],
t["vintage"]["wine"]["style"]["name"]
)
for t in r.json()["explore_vintage"]["matches"]
]
# Saving the results in a dataframe
dataframe = pd.DataFrame(
results,
columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"]
)
#output the dataframe
df_out = dataframe
df_out.to_csv("data.csv", index=False)
print("Complete -",x,"iterations")
CodePudding user response:
The problem is that some keys are randomly missing (notated with None) in a deeply nested dictionary. A sample dictionary demonstrating the struggle:
data = [
{'k1': {'k2': {'k3': 'value_i_want'}}},
{'k1': {'k2': None}},
{'k1': {'k2': {'k3': 'value_i_want'}}},
]
When you assume the key k3
certainly exists in each dictionary in an array, it does not. Hence when you try to do something like
result = [t['k1']['k2']['k3'] for t in data]
You get TypeError: 'NoneType' object is not subscriptable
.
TypeError
arises when t['k1']['k2']
evaluates to None
in the second iteration under the for-loop, and you attempt to look for a key in it. You are basically asking the program to execute None['k3']
, which explains the error message you've got.
To sovle this issue (which is very common in returned data from API requests), you will need to try-catch the block. You may find this helper function useful:
def try_to_get(d: dict, *args, default=None):
try:
for k in args:
d = d[k]
return d
except (KeyError, TypeError) as _:
print(f'Cannot find the key {args}')
return default
Using the helper function, we can write try_to_get(t, 'k1, 'k2', 'k3)
. While a non-problematic dictionary would traverse down the nests and get the value you want, a problematic one will trigger the Exception block and return a deafult value when there is an error (here, the default value is None).
You can try to replace the list comprehension part in your code with this:
results = [
(
try_to_get(t, "vintage", "wine", "winery", "name"),
try_to_get(t, "vintage", "year"),
try_to_get(t, "vintage", "wine", "id"),
try_to_get(t, "vintage", "wine", "name"),
try_to_get(t, "vintage", "statistics", "ratings_average"),
try_to_get(t, "prices", 0, "amount"),
try_to_get(t, "vintage", "wine", "region", "country", "name"),
try_to_get(t, "vintage", "wine", "region", "country", "code"),
try_to_get(t, "vintage", "wine", "region", "name"),
try_to_get(t, "vintage", "wine", "style", "name"),
)
for t in r.json()["explore_vintage"]["matches"]
]