Home >
other > Solving!!!!!! After using regular crawler () with zip packaging why last 5 less when the data?
Solving!!!!!! After using regular crawler () with zip packaging why last 5 less when the data?
Himself a small white, when climbing embarrassing best found before the zip package data is complete, but after packaged article last 5 data can't see??????? Where is my write wrong?
O teach!!!!!!
# encoding: utf-8
The import re
The import requests
Def parse_page (url) :
Headers={
'the user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10 _14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36 ',
'Referer:' https://www.qiushibaike.com/'
}
The response=requests. Get (url, headers=headers)
Text=the response. The text
The users=re. The.findall (r '& lt; Div & gt; . *? Levels=re. The.findall (r '& lt; Div & gt; (. *?)
', text, re. DOTALL)
Contents=re. The.findall (r '& lt; Div \ s> . *?
(. *?) ', text, re. DOTALL)
Usersall=[]
For the user in the users:
Y=re. Sub (r "\ \ n", "", the user)
Usersall. Append (y.s trip ())
Contentall=[]
For the content in contents:
X=re. Sub (r '& lt; . *?> ', ", "content)
Contentall. Append (x.s trip ())
Poems=[]
For the value in the zip (usersall, levels, contentall) :
The user level, the content=value
Poem={
'the user: the user,
"Level" : level,
"Content" : the content
}
Poems. Append (poem)
For poem in poems:
Print (poem)
Def the main () :
Url='https://www.qiushibaike.com/text/page/1/'
# for x in range (1, 6) :
# url='https://www.qiushibaike.com/text/page/%s/' % x
Parse_page (url)
If __name__=="__main__ ':
The main ()
CodePudding user response:
Data fetching the number right?
CodePudding user response:
I had to print before put in packaging, each all can print it out, but into the zip last five disappeared mysteriously after packaging,
CodePudding user response:
You first under the three usersall len, levels, contentall, certainly is not the same length
CodePudding user response:
A=[1, 2]
B=[]
For c, d in zip (a, b) :
Print (c, d)
Try to run the
CodePudding user response:
Print (len (users), len (levels), len (contents)) get 25 # 17 25
Only print article 17 data