Home > Net >  Scrapy error show when there are no data to scrape for an object
Scrapy error show when there are no data to scrape for an object

Time:10-02

I keep getting the error "'NoneType' object is not subscriptable" when I run my Scrapy code. I understand that the object value is None, but how to skip that and instruct Scrapy to record this object as an empty one?

Here are the method

def parse_country(self, response):
    try:

        item = response.meta['item']
        link_id = response.meta['link_id']
        place_data = json.loads(response.body)
        
        place_country = place_data[0][0][0]

        item['place_country'] = place_country

        yield item
    
    except Exception as e:
        print(e)    

The error only show when there are no data to scrape.

CodePudding user response:

Try/except is useful to catch errors or bugs. I would suggest an if/else solution.

Something like that could work for you:

def parse_country(self, response):
    item = response.meta['item']
    link_id = response.meta['link_id']
    place_data = json.loads(response.body)                    
                 
    if place_data[0][0][0] is not None:
         place_country = place_data[0][0][0]
         item['place_country'] = place_country
    else
         item['place_country'] = 'No Country found'
    

CodePudding user response:

Note that the use of try block as a control statement is not a good practice.

When you write place_data[0][0][0] it means that you are looking for multi-level nested list. The solution is to check for None and length at each level. If any of these items are None you will have this error.

The solution is to check for all these values. You can do it one if statement like this

if place_data and len(place_data) > 0 \
         and place_data[0] and len(place_data[0]) > 0 \
         and place_data[0][0] and len(place_data[0][0]) \
         and place_data[0][0][0]:
    item['place_country'] = place_data[0][0][0]
else:
     item['place_country'] = None

Or you could break it down into multiple, nested if statements for better readability.

Side note # 1: Use of meta is not recommended in the newer version of scrapy. Use cb_kwargs instead. See the docs.

Side Note # 2: You can directly get json by calling response.json()

  • Related