How can you optimize the following code and write less?
class CleanItem():
def process_item(self, item, spider):
PSCV = str(item['page_source_canonical']).split("'")[1]
if PSCV != "":
if PSCV != item['page_source']:
item['page_source_canonical_is_itself'] = False
else:
item['page_source_canonical_is_itself'] = True
else:
item['page_source_canonical_is_itself'] = True
return item
First it checks if it is empty. If it is empty, it is true. If it is not empty it should be checked and if it is the same then again it is true. otherwise it is false.
CodePudding user response:
You wrote
if PSCV != "":
if PSCV != item['page_source']:
item['page_source_canonical_is_itself'] = False
else:
item['page_source_canonical_is_itself'] = True
else:
item['page_source_canonical_is_itself'] = True
You want
item['page_source_canonical_is_itself'] = PSCV in (
'', item['page_source'])
That's DRY, and very clearly spells out Author's Intent that we shall assign True upon matching either value, else False.
Style nit: PEP 8 asks that you spell it pscv
, lowercase.
Consider deleting the unused spider
parameter.
CodePudding user response:
Yes, you could simplify
if PSCV != "":
if PSCV != item['page_source']:
item['page_source_canonical_is_itself'] = False
else:
item['page_source_canonical_is_itself'] = True
else:
item['page_source_canonical_is_itself'] = True
to
if PSCV != "" and PSCV != item['page_source']:
item['page_source_canonical_is_itself'] = False
else:
item['page_source_canonical_is_itself'] = True
or even further to
condition = PSCV != "" and PSCV != item['page_source']
item['page_source_canonical_is_itself'] = not condition
CodePudding user response:
if PSCV != "":
if PSCV != item['page_source']:
item['page_source_canonical_is_itself'] = False
else:
item['page_source_canonical_is_itself'] = True
else:
item['page_source_canonical_is_itself'] = True
If I wanted to refactor the code above, two options
The most explanatory (not necessarily the shortest)
if (PSCV == ""): item['page_source_canonical_is_itself'] = True elif (PSCV == item['page_source']): item['page_source_canonical_is_itself'] = True else: item['page_source_canonical_is_itself'] = False
The shorter - ternary
item['page_source_canonical_is_itself'] = ( (PSCV == "") or (PSCV == item['page_source']) )? true: false