Python crawler learn novice, refer to the great god-CodePudding

Refer to the great god, and I climbed in the first page of the site for */6444. The HTML, the second page */6444 _2. HTML, third page */6444 _3. HTML
Starting from the second page crawl rule is easy to get, ask how to add the first page. How do you want ask next code, modify the

 

"" "
@ description: learning python3 
@ the author: CHZ 
@ datetime: the 2021-03-21 15:19:27 
"" "
# import requests module 
The import requests 
# import BeautifulSoup module 
The from bs4 import BeautifulSoup 
# import urllib module 
The import urllib 


X=0 
Def getKunvImg (page=2) : 
# website picture address 
UrlKunv='https://*/2019/6444 _ {} in HTML'. The format (page) 
# to add headers to identify this crawlers disguised as a browser to access 
Headers={' the user-agent ':' Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 '} 
# a network request, access to the returned HTML 
Res=requests. Get (urlKunv, headers=headers) 
# set encoding to utf-8 
Res. The encoding="utf-8" 
# formatted HTML 
Soup=BeautifulSoup (res) text, '. The HTML parser) 
# through class style for. Img_single tag 
For new in soup. Select (' content ') : 
Global x 
# returns a list of tag number 
If len (new. Select (' a ')) & gt; 0: 
# get all img tags in ser path inside the picture 
Imgsrc=https://bbs.csdn.net/topics/new.select (" img ") [0] [' SRC '] 
Output image # address 
# print (new. Select (' img) [0] [' SRC ']) 
# will get to the inside of the SRC path below the images under the images stored in the project file 
Urllib. Request. Urlretrieve (imgsrc. '/images/s.j pg' % % x) 
X +=1 
# download output which zhang 
Print (' is downloading the first % d zhang '% x) 


For I in range (2, 48) : 
Output # download page 
Print (' is downloading the first {} page. The format (I)) 
GetKunvImg (I)