The import requests
The import re
# 1, download a web page
Url='https://www.fpzw.com/xiaoshuo/88/88413/'
# 2, and simulate the browser sends an HTTP request
The response=requests. Get (url) # type: object
# 3, encoding
The response. The encoding='GBK'
# 4, we get the source file
HTML=response. The text
# 5, the novel name
Title=re. The.findall (r 'var articlename=\' (. *?) \ '; ', HTML)
Print (the title)
# 6, a new file and save the
Fb=open (' % s.t xt '% title,' w ', encoding='GBK')
# 7, each chapter information
Dl=re. The.findall (r '& lt;/strong>
', HTML, re S) [0]
Chapter_info_list=re. The.findall (r '& lt; Dd> (. *?) ', dl, re S)
Print (chapter_info_list)
# 8, cycle each chapter respectively to download
For chapter_info chapter_info_list in:
Chapter_title=chapter_info [1]
Chapter_url=chapter_info [0]
Chapter_url="https://www.fpzw.com%s" % chapter_url
# 8.2 download content
Chapter_response=requests. Get (chapter_url)
Chapter_response. Encoding="utf-8"
Chapter_html=chapter_response. Text
# 8.3 extraction section
Chapter_content=re. The.findall (r '& lt; Script language="javascript" & gt; Tongzhi \ (\); </script> (. *?) ', chapter_html, re S) [0]
# 8.4 sorting data
Chapter_content=chapter_content. Replace (', ')
Chapter_content=chapter_content. Replace (' & amp; nbsp; ', ' ')
Chapter_content=chapter_content. Replace (' & lt; Br/& gt; ', '\ n')
# 8.5 save
Fb. Write (chapter_title)
Fb. Write (chapter_content)
Fb. Write (' \ n ')
Print (chapter_url)
CodePudding user response:
Is getting the null value, just learn it didn't take long, are also following tutorials, why wrong oh, cryingCodePudding user response:
Don't sink, top, top!CodePudding user response:
Try the brackets into EnglishCodePudding user response: