Sometimes I get the following message:
in process_item item['external_link_rel'] = dict_["rel"]
KeyError: 'rel'
It must be because it doesn't exist. I tried to manage but failed.
from lxml import etreeclass CleanItem():
def process_item(self, item, spider): try: root = etree.fromstring(str(item['external_link_body']).split("'")[1]) dict_ = {} dict_.update(root.attrib) dict_.update({'text': root.text}) item['external_link_rel'] = dict_["rel"] return item except KeyError as EmptyVar: if str(EmptyVar) == 'rel': dict_["rel"] = "null" item['external_link_rel'] = dict_["rel"] return item
Most likely, all problems are due to this line if str(EmptyVar) == 'rel'
.
Thank you for guiding me so that an operation is performed only when this error occurs.
Before asking the question, I did a lot of research and did not come to a conclusion
Just for information, the above codes are in the pipelines.py file inside the Scrapy framework
CodePudding user response:
A better way to do it is to use the dictionary attribute get
. You can read on it here
from lxml import etree
class CleanItem():
def process_item(self, item, spider):
root = etree.fromstring(str(item['external_link_body']).split("'")[1])
dict_ = {}
dict_.update(root.attrib)
dict_.update({'text': root.text})
item['external_link_rel'] = dict_.get("rel", "null")
return item
CodePudding user response:
Why not just use a conditional statement?
from lxml import etree
class CleanItem():
def process_item(self, item, spider):
root = etree.fromstring(str(item['external_link_body']).split("'")[1])
dict_ = {}
dict_.update(root.attrib)
dict_.update({'text': root.text})
if 'rel' not in dict_: # If 'rel' is not a key in dict
dict_["rel"] = "null"
item['external_link_rel'] = dict_["rel"]
return item
item['external_link_rel'] = dict_["rel"] # else ...
return item
If you really wanted to use try/except clauses you could do this. I would never recommend using try/except where it isn't necessary though.
def process_item(self, item, spider):
root = etree.fromstring(str(item['external_link_body']).split("'")[1])
dict_ = {}
dict_.update(root.attrib)
dict_.update({'text': root.text})
try:
item['external_link_rel'] = dict_["rel"]
return item
except KeyError:
dict_["rel"] = "null"
item['external_link_rel'] = dict_["rel"]
return item