When using scrapy framework, using a proxy IP will be a 302 redirect to other sites, then you can't get data, don't use the proxy server can normal crawl data, but for a while will be blocked, the crawler ua is random, other request header according to the original page to add, proxy IP can also tested, research for several days not to make, bosses, please, what is the problem, how to solve?
Only this point, the bosses to help
CodePudding user response:
1. In the use of scrapy framework, use the proxy server IP will be a 302 redirect to other sites, and then can't get data, don't use the proxy server can normal crawl data ":- big probability agent you use not high and the background identified and data limitations, in a batch of high quality agent? Buy buy buy,
2. Do not use "agent can crawl data normally, but will be for a while will be blocked, the crawler ua is random, other request header according to the original page to add,"
- try to reduce the crawl frequency, frequent random ua also easy to climb, suggest increase the sleep time, from the high end of the test server detection threshold,
Other: check the Settings file configuration items are modified accordingly,
CodePudding user response: