Home > other >  Pythono crawler
Pythono crawler

Time:02-19

The from bs4 import BeautifulSoup # page parsing, to get the data
Import urllib. Request, urllib. Error # URL, obtain web data

Def the main () :
Baseurl="HTTP://https://movie.douban.com/top250? Start="
# 1. Crawl web page
The datalist=getData (baseurl)
Savepath="douban film Top250. XLS"
# 3. Save the data
SaveData (savepath)
AskUPL (" https://movie.douban.com/top250? Start=")

# crawl web page
Def getData (baseurl) :
The datalist=[]
For I in range (0, 10) :
Url=baseurl + STR (I * 25)
HTML=askUPL (url)
# 2. Analytical data by
Return datalist

# get to specify a URL of the page content
Def askURL (url) :
The head={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68
"}
Request=urllib. Request. The request (headers=head)
HTML=""
Try:
The response=urllib. Request. Urlopen (request)
HTML=response. The read (). The decode (" utf-8 ")
Print (HTML)
Except urllib. Error. URLError as e:
If hasattr (e, "code") :
Print (ode) of e.c. with our fabrication:
If hasattr (e, "" reason") :
Print (" e.r eason)
Return the HTML

# save data
Def saveData (savepath) :
Print (" save...
")
If __name__=="__main__" :
The main ()


Why don't always run out? Has been according to the
D: \ douban \ venv \ Scripts \ python exe D:/douban venv/spiders. Py
Traceback (the most recent call last) :
The File "D:/douban/venv/spiders. Py", line 51, the in & lt; module>
The main ()
The File "D:/douban/venv/spiders. Py", line 13, in the main
The datalist=getData (baseurl)
The File "D:/douban/venv/spiders. Py", line 24, in getData
HTML=askUPL (url)
NameError: name 'askUPL' is not defined

CodePudding user response:

The website not mistyped, copied a newline
  • Related