Home > other >  Scrapy crawl data, agent how to avoid the delay?
Scrapy crawl data, agent how to avoid the delay?

Time:09-21

Small white help
The first:
Scrapy crawl data through agent, when there is delay a proxy, the program will get stuck, achieve DOWNLOAD_TIMEOUT this value until the request time, error release thread, in the process of this waiting didn't do anything, if the agent is not high quality whole crawl time will slowly,
But if it is in c # or Java multi-thread crawl will not appear this problem, the thread is mutual influence between
Speed: a second more than two

Configuration:


The second:
agent detection
If specified in the request link to the agency before testing operation, won't appear the problem above, but have a new question,
If the agent is available, then agent detection operations and redundant, and overall speed is slower than when you don't test
Speed: a seconds a

Speed must be the first one better, the first is there any solution?
Or is there a better way? O bosses give a suggestion
  • Related