Home > other >  Pull hook online dispute against the crawler mechanism problems
Pull hook online dispute against the crawler mechanism problems


Retractor crawl information online is the bosses, occurs when the "{" status" : false, "MSG" : "you too frequent operation, please visit again later", "clientIp" : "", "state" : 2408} "problem how to solve ah, all add to the request condition, and also set up a dynamic UA, but still won't do
# 'Accept' : 'text/HTML, application/XHTML + XML, application/XML. Q=0.9 */*; Q=0.8 ',
# 'Accept - Language' : 'en'
"Accept: application/json, text/javascript, */*;
q=0.01,"Accept - the encoding", "gzip, deflate, br",
"Accept - language" : "useful - CN, useful;
q=0.9,The content-type: "application/x - WWW - form - urlencoded; Charset=utf-8,
""Cookie", "JSESSIONID=ABAAAECABIEACCA6B0B35CC82843AFC00F32AC6B45A76AE; F3e9 WEBTJ - ID=20200520170246-172315226-0 f6c9 b4108 d022377fe6f86-366-921600-172315226; RECOMMEND_TIP=true; _ga=GA1.2.1963165012.1589965367; _gid=GA1.2.1320686678.1589965367; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1589965367; User_trace_token=20200520170247-3 f09971d - 19 df - 4827-9 c99b5c9dcf BCF - 74; LGUID=20200520170247 - da3d96d1 - a445-42 ae - f320e5e4 a7f4-9374; E5 index_location_city=% % % 85% A8 E5 % b % 9 BD; Sensorsdata2015jssdkcross=% 7 b % 22 distinct_id % 22% % 3 a b4108 e1b46a8d664 e8488-081-366 22172315288-921600-172315288 24 device_id eaac4%22% 2 c % 22% % 22% % 3 a b4108 e1b46a8d664 e8488 22172315288-081-366-921600-172315288 eaac4%22% 7 d; Sajssdk_2015_cross_new_user=1; TG - TRACK - CODE=search_code; X_MIDDLE_TOKEN=fbb896af01a0adbc319581251f75b474; X_HTTP_TOKEN=8 f6ad38bd41b517633076998513a00d575eaa5b241; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1589967033; LGRID=20200520173033-36 b82 acdbfa - 3-4 c5a - de6f11 aa26-281603; SEARCH_ID=c6ae56ca6a0644a78264dbc3d61c1e4c ",
"Referer" : "https://www.lagou.com/jobs/list_python? LabelWords=& amp; FromSearch=true& Suginput=",
"X" (anit - forge - code: 0,
"X - anit - forge - token" : None,
"X - requested - with" : "the XMLHttpRequest

 agent1='Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36 '
='E808 agent2 SAMSUNG - SGH - E808/1.0 * MzU0MTk0MDAwNTgzMDgx UP in Browser/ (GUI) MMP/1.0'
Agent3='D500C SAMSUNG - SGH - D500C/Profile/MIDP 1.0-2.0 Configuration/CLD \
C - 1.1 - UP. Browser/ C. 1.101 (GUI) MMP/2 '
Agent4='Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 \
(KHTML, like Gecko) Chrome/70.0.3538.102 Safari/Edge/537.36 18.18362 '
='E100A agent5 SAMSUNG - SGH - E100A/T2 UP. Browser/ (GUI) MMP/1.0'

CodePudding user response:

Control access speed, slow down a bit, or use IP agent pool

CodePudding user response:

reference 1/f, kohane Jary response:
control access speed, slow down a bit, or use IP agent pool

To control the access speed, still won't do
Finally using selenium to crawl

CodePudding user response:

refer to the second floor weixin_43497769 response:
Quote: reference 1/f, kohane Jary response:
control access speed, slow down a bit, or use IP agent pool

To control the access speed, still won't do
Finally using selenium to crawl

You this is blocked IP, agent pool can estimate, you control the access speed is slower than the selenium

CodePudding user response:

The proxy IP pool

CodePudding user response:

reference kohane Jary reply: 3/f
Quote: refer to the second floor weixin_43497769 response:

Quote: reference 1/f, kohane Jary response:
control access speed, slow down a bit, or use IP agent pool

To control the access speed, still won't do
Finally using selenium to crawl

You this is blocked IP, agent pool can estimate, you control the access speed that is slower than the selenium?

Ip agent pool can, thank you for your bosses (* °?=3 °)
  • Related