Home > Software design >  Error handling in variables when using Requests
Error handling in variables when using Requests

Time:04-27

I have the below code that works fully up until I set x=37. At this point, I receive the error

TypeError: 'NoneType' object is not subscriptable on the variable t["vintage"]["wine"]["region"]["country"]["name"].

I have added another variable that the same issue happens on almost everytime, so you may find the error there instead.

I think this is because one of the 25 results on that page does not have a country name assigned to it and therefore the variable is giving an error.

I think I need to add an exception to each variable to handle where this is the case. I have seen examples of adding these except, they seem to be at the level of the request not finding a legitimate page rather than one of the variables and I can't find guidance to add them at the variable level.

# Import packages
import requests
import json
import pandas as pd
import time

x=37

# Get request from the Vivino website
r = requests.get(
    "https://www.vivino.com/api/explore/explore",
    params={
        #"country_code": "FR",
        #"country_codes[]":"pt",
        "currency_code":"GBP",
        "grape_filter":"varietal",
        "min_rating":"1",
        "order_by":"price",
        "order":"asc",
        "page": x,
        "price_range_max":"100",
        "price_range_min":"25",
        "wine_type_ids[]":"1"
},
headers= {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
    },
)

# Variables to scrape from the Vivino website
results = [
    (
        t["vintage"]["wine"]["winery"]["name"],
        t["vintage"]["year"],
        t["vintage"]["wine"]["id"],
        t["vintage"]["wine"]["name"],
        t["vintage"]["statistics"]["ratings_average"],
        t["prices"][0]["amount"],
        t["vintage"]["wine"]["region"]["country"]["name"],
        t["vintage"]["wine"]["region"]["country"]["code"],
        t["vintage"]["wine"]["region"]["name"],
        t["vintage"]["wine"]["style"]["name"]
    )
    for t in r.json()["explore_vintage"]["matches"]
]

# Saving the results in a dataframe
dataframe = pd.DataFrame(
    results,
    columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"]
)
    
#output the dataframe
df_out = dataframe
df_out.to_csv("data.csv", index=False)
print("Complete -",x,"iterations")

CodePudding user response:

The problem is that some keys are randomly missing (notated with None) in a deeply nested dictionary. A sample dictionary demonstrating the struggle:

data = [
  {'k1': {'k2': {'k3': 'value_i_want'}}},
  {'k1': {'k2': None}},
  {'k1': {'k2': {'k3': 'value_i_want'}}},
]

When you assume the key k3 certainly exists in each dictionary in an array, it does not. Hence when you try to do something like

result = [t['k1']['k2']['k3'] for t in data]

You get TypeError: 'NoneType' object is not subscriptable.

TypeError arises when t['k1']['k2'] evaluates to None in the second iteration under the for-loop, and you attempt to look for a key in it. You are basically asking the program to execute None['k3'], which explains the error message you've got.

To sovle this issue (which is very common in returned data from API requests), you will need to try-catch the block. You may find this helper function useful:

def try_to_get(d: dict, *args, default=None):
    try:
        for k in args:
            d = d[k]
        return d
    except (KeyError, TypeError) as _:
        print(f'Cannot find the key {args}')
        return default

Using the helper function, we can write try_to_get(t, 'k1, 'k2', 'k3). While a non-problematic dictionary would traverse down the nests and get the value you want, a problematic one will trigger the Exception block and return a deafult value when there is an error (here, the default value is None).

You can try to replace the list comprehension part in your code with this:

results = [
    (
        try_to_get(t, "vintage", "wine", "winery", "name"),
        try_to_get(t, "vintage", "year"),
        try_to_get(t, "vintage", "wine", "id"),
        try_to_get(t, "vintage", "wine", "name"),
        try_to_get(t, "vintage", "statistics", "ratings_average"),
        try_to_get(t, "prices", 0, "amount"),
        try_to_get(t, "vintage", "wine", "region", "country", "name"),
        try_to_get(t, "vintage", "wine", "region", "country", "code"),
        try_to_get(t, "vintage", "wine", "region", "name"),
        try_to_get(t, "vintage", "wine", "style", "name"),
    )
    for t in r.json()["explore_vintage"]["matches"]
]
  • Related