Home > database >  How to handle specific error messages in Python
How to handle specific error messages in Python

Time:08-17

Sometimes I get the following message:

in process_item item['external_link_rel'] = dict_["rel"]
KeyError: 'rel'

It must be because it doesn't exist. I tried to manage but failed.

from lxml import etree

class CleanItem():

def process_item(self, item, spider):

    try:
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        item['external_link_rel'] = dict_["rel"]
        return item
   
    except KeyError as EmptyVar:
        if str(EmptyVar) == 'rel': 
            dict_["rel"] = "null"
            item['external_link_rel'] = dict_["rel"]
            return item

Most likely, all problems are due to this line if str(EmptyVar) == 'rel'.


Thank you for guiding me so that an operation is performed only when this error occurs.
Before asking the question, I did a lot of research and did not come to a conclusion
Just for information, the above codes are in the pipelines.py file inside the Scrapy framework

CodePudding user response:

A better way to do it is to use the dictionary attribute get. You can read on it here

from lxml import etree

class CleanItem():
    def process_item(self, item, spider):
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        item['external_link_rel'] = dict_.get("rel", "null")
        return item

CodePudding user response:

Why not just use a conditional statement?

from lxml import etree

class CleanItem():
    def process_item(self, item, spider):
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        if 'rel' not in dict_:            # If 'rel' is not a key in dict
           dict_["rel"] = "null"          
           item['external_link_rel'] = dict_["rel"]  
           return item                    
        item['external_link_rel'] = dict_["rel"]  # else ...
        return item

If you really wanted to use try/except clauses you could do this. I would never recommend using try/except where it isn't necessary though.

def process_item(self, item, spider):
    root = etree.fromstring(str(item['external_link_body']).split("'")[1])
    dict_ = {}
    dict_.update(root.attrib)
    dict_.update({'text': root.text})
    try:
        item['external_link_rel'] = dict_["rel"]
        return item
    except KeyError:
        dict_["rel"] = "null"
        item['external_link_rel'] = dict_["rel"]
        return item
  • Related