Home > other >  Crawler to crawl, the great god instructions
Crawler to crawl, the great god instructions

Time:09-16


 import requests 
Def getHTMLtext (url) :
The head={' authority ':' e-hentai.org ',
'method' : 'GET',
'path' : '/g/1643510/1 df3129936/? P=0 ',
'scheme' : 'HTTPS',
'accept' : 'text/HTML, application/XHTML + XML, application/XML. Q=0.9, image/webp image/apng, */*; Q=0.8, application/signed - exchange; V=b3. Q=0.9 ',
'the accept - encoding' : 'gzip, deflate, br',
'the accept - language' : 'useful - CN, useful; Q=0.9, useful - TW; Q=0.8, ja. Q=0.7 ',
'cache-control' : 'Max - age=0',
'cookies' :' __cfduid=dbc930db2fe57ecc79be64eefda3e78de1590384229 ',
'the SEC - fetch - dest' : 'document',
'the SEC - fetch - mode:' navigate,
'the SEC - fetch - site' : 'none',
'the SEC - fetch - user' : '? 1 ',
'the upgrade - insecure - requests' :' 1 ',
'the user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10 _14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 '}
Try:
R=requests. Get (url, headers=head, outtime=5)
R.r aise_for_status () # 200 if not, throw exceptions
R.e ncoding=of state Richard armitage pparent_encoding # for coding language
Return r.t ext
Except:
Return "Error"
Url="https://e-hentai.org/g/1653031/2dcea4e2cc/"
Print (getHTMLtext (url) [: 1000])
  • Related