Home > other >  With python to HTML folder, there is an error urllib3. Exceptions. MaxRetryError
With python to HTML folder, there is an error urllib3. Exceptions. MaxRetryError

Time:09-29

The from urllib3 import *
The import of the from re *
HTTP=PoolManager ()
Disable_warnings ()
# download HTML file
Def the download (url) :
Result=HTTP request (' GET ', url)
# will download HTML file code with utf-8 format decoding into string
HtmlStr=result. The data. The decode (' utf-8)
Return htmlStr
# analysis HTMl file
Def analyse (htmlStr) :
# the regular expression is used to collect all the a label, such as & lt; A href="https://bbs.csdn.net/topics/a.html" & gt; First page
AList=the.findall (' & lt; A [^ & gt;] * & gt; ', htmlStr)
Result=[]
# to iterate a tag list
For a in aList:
# use regular expressions from a label extract the value of the href attribute, such as & lt; A href='https://bbs.csdn.net/topics/a.html' & gt; a
G=the search (' href=[\ s] * [\ s] * [\ '"] ([^ & gt; [\' \ '" "] *) "]', a)
If g!=None:
# for a label the value of the href attribute, the href attribute value is the value of the first group
Url=g.g roup (1)
# url into absolute links
Url='http://localhost:8888/files/' + url
# will extract the appended to the url of the result list
Result. Append (url)
Return the result
# is used to grab the HTML file from the entry point function,
Def crawler (url) :
# output are urls fetched
Print (url)
# download HTML file
HTML=download (url)
HTML code # analysis
Urls=analyse (HTML)
# for each url recursive call crawler function
For the url in urls:
Crawler (url)
# starting from the entry point to the url to grab all the HTML files
Crawler (' http://localhost:8888/files')


The contents of the HTML file is very simple, such as:

Index
<body>










Error: urllib3. Exceptions. MaxRetryError: HTTPConnectionPool (host='localhost', the port=8888) : Max retries exceeded with url:/files (under Caused by NewConnectionError (' & lt; Urllib3. Connection. HTTPConnection object at 0 x000002344deac978 & gt; : Failed to establish a new connection: [10061] WinError because the target machine actively refused, unable to connect, '))

  • Related