Scrapy - redis crawler after running for a while won't take the url from the queue-CodePudding

I have stored two thousand in the queue of redis start_url, but in time to crawl every time he climbed dozens or hundreds of article, will be entering a state of waiting for start_url

CodePudding user response:

Is there a big help to analyze, I baidu checked a lot better, but didn't find a solution,
Because it can get data from the queue to crawl, just will run for a while after get the data, restart the reptiles, and there will be some start_url can climb, but will not start_url again in a short time

CodePudding user response:

no one eyebrow

CodePudding user response:

Is the local crawl or say anything else?

CodePudding user response:

The

reference 3 floor weixin_48478655 response:

is the local crawl or say anything else?

climb on a server, redis on another server

CodePudding user response:

reference 4 floor YnKness response:

Quote: reference weixin_48478655 reply: 3/f
is the local crawl or say anything else?

climb on a server, redis on another server

how to take it is best to disguise is on a network, in case of a one thousand IP to access no, although you are in his two service crawl back and forth, but also may give IP, how are you locally so that there is no problem, I only can say won't have the probability is IP for sealing the visit

CodePudding user response:

reference 1st floor YnKness response:

Is there a big help to analyze, I baidu checked a lot better, but didn't find a solution,
Because it can get data from the queue to crawl, is can't gain the data after running for a while and restart the reptiles, and there will be some start_url can climb, but not for a while and will have no start_url

I met the same problem with you, seems I am time to add the request queue, can also listen to a few minutes before go to crawl, seemingly can't listen to this queue, can you solve it