Home > other >  IP blocked? Good books
IP blocked? Good books

Time:09-15

A possible solution
Is also based on ADSL dial-up, the difference is that need can take two sets of ADSL dial-up server, use both servers as the agent in the process of fetching,
Suppose there is A, B two ADSL dial-up server, crawlers in C running on the server, use A as A proxy to access the network, if in the process of fetching in case of blocking access agent will immediately switch to B, then A redial, if meet again is to switch to A proxy access is forbidden, B dialing again, and again, the figure below:
Dial the agent using A, B:
& amp; amp; lt; Img data - rawheight=& amp; Quot; 327 & amp; Quot; Data - rawwidth=& amp; Quot; 721 & amp; Quot; src=https://bbs.csdn.net/topics/"https://pic1.zhimg.com/50/9196e28cd8621a06cd0f0339f1fa765b_hd.jpg" class="origin_image useful - lightbox - thumb" width="721" data - the original="https://pic1.zhimg.com/9196e28cd8621a06cd0f0339f1fa765b_r.jpg" & amp; Gt;
Use the B as the agent, A dial:
& amp; lt; Img data - rawheight=& amp; quot; 327 & amp; quot; Data - rawwidth=& amp; quot; 721 & amp; quot; src=https://bbs.csdn.net/topics/"https://pic2.zhimg.com/50/7afaf540be23920733bc466ae3f6f651_hd.jpg" class="origin_image useful - lightbox - thumb" width="721" data - the original="https://pic2.zhimg.com/7afaf540be23920733bc466ae3f6f651_r.jpg" & amp; gt;





Code crawler (web) :
The import requests
Import the random
Pro=[' 122.152.196.126 ', '114.215.174.227', '119.185.30.75]
The head={
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64 x64) AppleWebkit/537.36 (KHTML, like Gecko) chrome/58.0.3029.110 Safari/537.36 '
}
Url='http://www.whatismyip.com.tw/'
R=requests. Get (url, proxies={' HTTP ': the random choice (pro)}, headers=head)
R.e ncoding=of state Richard armitage pparent_encoding
Print (r.s tatus_code)
Print (r.t ext)






Other:

# coding=utf-8



The import requests

The import time

The from LXML import etree



Def getUrl () :

For I in range (33) :

Url='{} http://task.zbj.com/t-ppsj/p s5. HTML. The format (I + 1)

SpiderPage (url)



Def spiderPage (url) :

If the url is None:

Return None



Try:

Proxies={

'HTTP:' http://221.202.248.52:80 ',



}

User_agent='Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari 537.36/Core/1.53.4295.400 '



Headers={' the user-agent: user_agent}

HtmlText=requests. Get (url, headers=headers, proxies=proxies). The text



The selector=etree. HTML (htmlText)

TDS=selector. Xpath ('//*/@/table/tr ')

For td in TDS:

Price=td. Xpath ('/td/p/em/text () ')

A href=https://bbs.csdn.net/topics/td.xpath ('/td/p/a/@ href ')

Title=td. Xpath ('/td/p/a/text () ')

The subTitle=td. Xpath ('/td/p/text () ')

Deadline=td. Xpath ('/td/span/text () ')

Price=price [0] if len (price) & gt; 0 the else '

Title=title [0] if len (title) & gt; 0 the else '

Href=https://bbs.csdn.net/topics/href [0] if len (href)> 0 else '

The subTitle=the subTitle [0] if len (subTitle) & gt; 0 the else '

Deadline, deadline of=[0] if len (deadline) & gt; 0 the else '

The print price, the title, the href, subTitle, deadline

Print '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --'

SpiderDetail (href)

Except the Exception, e:

Print 'wrong', e.m essage



Def spiderDetail (url) :

If the url is None:

Return None



Try:

HtmlText=requests. Get (url). The text

The selector=etree. HTML (htmlText)

AboutHref=https://bbs.csdn.net/topics/selector.xpath ('//* [@ id="utopia_widget_10"]/div [1]/div/div/div/p [1]/a/@ href ')

Price=selector. Xpath ('//* [@ id="utopia_widget_10"]/div [1]/div/div/div/p [1]/text () ')

Title=the selector. Xpath ('//* [@ id="utopia_widget_10"]/div [1]/div/div/h2/text () ')

ContentDetail=selector. Xpath ('//* [@ id="utopia_widget_10"]/div [2]/div/div [1]/div [1]/text () ')

PublishDate=selector. Xpath ('//* [@ id="utopia_widget_10"]/div [2]/div/div [1]/p/text () ')

AboutHref=https://bbs.csdn.net/topics/aboutHref [0] if len (aboutHref)> 0 else '# python ternary operations: to true if the result of the decision condition when else is false, the results of the

Price=price [0] if len (price) & gt; 0 the else '

Title=title [0] if len (title) & gt; 0 the else '

ContentDetail=contentDetail [0] if len (contentDetail) & gt; 0 the else '

PublishDate=publishDate [0] if len (publishDate) & gt; 0 the else '

Print aboutHref, price, the title, contentDetail, publishDate

Except:

Print 'error'



If '_main_ :

GetUrl ()

CodePudding user response:

This requirement is certainly looking for HTTP proxy, BAN cut agent immediately, your method premise is: dial-up will get different IP, what's more, if the operator level is LAN, no matter how to change, there is an IP server,

CodePudding user response:

666666
  • Related