Home > other >  About scrapy single crawler item more and more in the Pipeline, the chaos in the yield of the wrong
About scrapy single crawler item more and more in the Pipeline, the chaos in the yield of the wrong

Time:09-24

In the setting of the two pipes, and set the priority
Returned to the two yield in the main program, a return to the text, a return to the image URL
In the item file defines two item classes, and instantiation, respectively, in the main program to save the item
Problems occurred during pipeline:
1. The two item will perform in the pipeline, despite the if category judgement, this is not the key
2. Two yield in return, only the high priority can obtain the yield data, it is the key
3. Why the yield, the lower priority for less than the yield data, even if the tube into the right cannot obtain data
Def process_item (self, item, spiders) :
Print (" form pipeline has entered... ", item)
# form pipeline has entered... None
4. Another yield tube into image can obtain the item data
Of new, give directions! Genuflect is begged!!!!!

CodePudding user response:

You can set the middleware, to avoid confusion

CodePudding user response:

The
reference 1/f, 1234! Response:
can set the middleware, avoid confusion

Specific how to implement the middleware, can write a pseudo code to have a look?

CodePudding user response:

Come to a big, I have been waiting for two days

CodePudding user response:

The building Lord your problem solved, I also encountered this problem today, even set the judgment of the pipelines, but two will enter all the data of item classes, including a data, another will all return to None

CodePudding user response:

If you didn't return the item in the pipeline of high priority

CodePudding user response:

Assuming that the high priority pipe processing text (process_item_text), the processing of low priority url (process_item_url)
Suppose a item_url item into the pipeline, pipeline process_item_text priority, judge whether the item Item_Text instance, if they are to print the processing, if not yield, into the lower priority process_item_url processing
Don't know the answer is not satisfied

Def process_item_text (self, item, spiders) :

If isinstance (item, Item_Text) :
Print (" the item is the text ")
The else:
Yield item

Def process_item_url (self, item, spiders) :
Print (" the item is a url ")
  • Related