According to run the book for a moment, the runtime can climb to the content for the first time,
The second and later have been showed a restart can't see the content of the runtime, ,
The import json
The import requests
The from requests. Exceptions import RequestException
The import re
The import time
Def get_one_page (url) :
Try:
Headers={
'the user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10 _13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36 '
}
The response=requests. Get (url, headers=headers)
If the response. Status_code==200:
Return the response. The text
Return None
Except RequestException:
Return None
Def parse_one_page (HTML) :
The pattern=re.com running (' & lt; Dd> . *? Board - index. *?> (\ d +) & lt;/i> . *? Data - SRC="https://bbs.csdn.net/topics/(. *?) ". *? The name "& gt; & lt; A '
+ '. *?> (. *?) & lt;/a> . *? Star "& gt; (. *?) & lt;/p> . *? Releasetime & gt; "" (. *?) & lt;/p> '
+ '. *? The integer "& gt; (. *?) & lt;/i> . *? Fraction "& gt; (. *?) & lt;/i> . *? & lt;/dd> 're. S)
The items=re. The.findall (pattern, HTML)
For the item in the items:
Yield {
'the index: item [0],
'image' : item [1],
"Title" : the item [2],
'actors' : item [3]. The strip () [3:],
'time: the item [4]. The strip () [5:],
'score' : item [5] + item [6]
}
Def write_to_file (content) :
With the open (' result. TXT ', 'a', encoding="utf-8") as f:
F.w rite (json. Dumps (content, ensure_ascii=False) + '\ n')
Def main (offset) :
Url="http://maoyan.com/board/4? Offset='+ STR (offset)
HTML=get_one_page (url)
For the item in parse_one_page (HTML) :
Print (item)
Write_to_file (item)
If __name__=="__main__ ':
For I in range (10) :
The main (offset=I * 10)
time.sleep(1)