Python bug website images-CodePudding

The import re
The import requests
The import urllib. Request
The from bs4 import BeautifulSoup
The from urllib. Request the import urlopen
The import time
The import OS

Headers={' the user-agent ':' Mozilla/5.0 (Windows NT 6.1; WOW64. The rv: 23.0) Gecko/20100101 Firefox/23.0 '}
# https://stock.tuchong.com/? Source=tc_pc_home
Urlindex='https://stock.tuchong.com/? Source=tc_pc_home '
Reqindex=urllib. Request. The request (url=urlindex, headers=headers)
HtmlCodeIndex=urllib. Request. Urlopen (reqindex). The read ()
DataIndex=htmlCodeIndex. Decode (' utf-8)

SoupIndex=BeautifulSoup (dataIndex, '. The HTML parser)

RegIndex=r '" topicId ":. *? , '# character
Reg_ImgIndex=re.com running (regIndex) # compile the
ImglistIndex=reg_ImgIndex. The.findall (dataIndex)
Print (imglistIndex)
XIndex=0

For igIndex imglistIndex in:
XIndex=xIndex + 1
Print (igIndex [10:1])

IndexId=igIndex [10:1]
# headers={' the user-agent ':' Mozilla/5.0 (Windows NT 6.1; WOW64. The rv: 23.0) Gecko/20100101 Firefox/23.0 '}
TopicId=49364
Url="https://stock.tuchong.com/topic? TopicId='+ STR (topicId)
The req=urllib. Request. The request (url=url, headers=headers)
# urllib. Request. Urlopen (the req.) read ()
# page=urllib. Request. Urlopen (url)
HtmlCode=urllib. Request. Urlopen (the req.) read ()
Data=https://bbs.csdn.net/topics/htmlCode.decode (' utf-8)
# print (data)

# pagefile=open (' pagecode1. TXT ', 'wb)
# pagefile. Write (htmlCode)
# page. The close ()

Soup=BeautifulSoup (data, "HTML parser")

# reg=r 'SRC="https://bbs.csdn.net/topics/(. +? \. JPG) "' # write image of a regular expression: reg=r 'SRC=" https://bbs.csdn.net/topics/(. +? "\. JPG '
Reg=r '" imageId ":". *?" '
Reg_img=re.com running (reg) # compile, run faster
Imglist=reg_img. The.findall (data)
Print (imglist)
X=0
# pageFile=open (' pageCode2. TXT ', 'wb) # to write the way to open the pageCode. TXT
Imglist2=[]
For img in imglist:
X=x + 1
Print (img] [11: - 1)
Imglist2. Append (img] [11: - 1)

Print (imglist2)

X=0
For ig in imglist2:
X=x + 1
Print (ig)

# urllib. Request. Urlretrieve (' http://ppic.meituba.com:83/uploads3/181201/3-1Q20111553V11.jpg ', '% s.j pg % x)
# x + 1
# python download images to the local method
Ig image_url="https://icweiliimg6.pstatp.com/weili/l/" + + ". Webp "
# image_url2=""
# image_url=img
File_path='E:/new folder 4 images/pictures/creeper' + indexId + '/picture'

Try:
If not OS. Path. The exists (file_path) :
OS. Makedirs (file_path) # if the folder does not exist directly create a
File_suffix=OS. Path. Splitext (image_url) [1]
Print (file_suffix)
Filetype='webp'
Filename='{} {}'. The format (file_path + STR (x), filetype) # splicing filename
# x=x + 1
Print (filename)
# urllib. Request. Urlretrieve (image_url, filename=filename) # using urllib. Request. Urltrieve method to download image that there may be 403 forbidden
In order to prevent 403 # region here to this way of output
Res=requests. Get (image_url)
With the open (file_path + STR (x) + 'JPG', 'wb) as f:
F.w rite (res) content)
Print (111)
# endregion
Except IOError as e:
1, print (e)

Except the Exception as e:
2, print (e)

Time. Sleep (2)

CodePudding user response:

Example of this is for you

CodePudding user response:

reference 1st floor dabingsou response:

is this example of

yeah, wrote a small example