Home > other >  Python bug website images
Python bug website images


The import re
The import requests
The import urllib. Request
The from bs4 import BeautifulSoup
The from urllib. Request the import urlopen
The import time
The import OS

Headers={' the user-agent ':' Mozilla/5.0 (Windows NT 6.1; WOW64. The rv: 23.0) Gecko/20100101 Firefox/23.0 '}
# https://stock.tuchong.com/? Source=tc_pc_home
Urlindex='https://stock.tuchong.com/? Source=tc_pc_home '
Reqindex=urllib. Request. The request (url=urlindex, headers=headers)
HtmlCodeIndex=urllib. Request. Urlopen (reqindex). The read ()
DataIndex=htmlCodeIndex. Decode (' utf-8)

SoupIndex=BeautifulSoup (dataIndex, '. The HTML parser)

RegIndex=r '" topicId ":. *? , '# character
Reg_ImgIndex=re.com running (regIndex) # compile the
ImglistIndex=reg_ImgIndex. The.findall (dataIndex)
Print (imglistIndex)

For igIndex imglistIndex in:
XIndex=xIndex + 1
Print (igIndex [10:1])

IndexId=igIndex [10:1]
# headers={' the user-agent ':' Mozilla/5.0 (Windows NT 6.1; WOW64. The rv: 23.0) Gecko/20100101 Firefox/23.0 '}
Url="https://stock.tuchong.com/topic? TopicId='+ STR (topicId)
The req=urllib. Request. The request (url=url, headers=headers)
# urllib. Request. Urlopen (the req.) read ()
# page=urllib. Request. Urlopen (url)
HtmlCode=urllib. Request. Urlopen (the req.) read ()
Data=https://bbs.csdn.net/topics/htmlCode.decode (' utf-8)
# print (data)

# pagefile=open (' pagecode1. TXT ', 'wb)
# pagefile. Write (htmlCode)
# page. The close ()

Soup=BeautifulSoup (data, "HTML parser")

# reg=r 'SRC="https://bbs.csdn.net/topics/(. +? \. JPG) "' # write image of a regular expression: reg=r 'SRC=" https://bbs.csdn.net/topics/(. +? "\. JPG '
Reg=r '" imageId ":". *?" '
Reg_img=re.com running (reg) # compile, run faster
Imglist=reg_img. The.findall (data)
Print (imglist)
# pageFile=open (' pageCode2. TXT ', 'wb) # to write the way to open the pageCode. TXT
For img in imglist:
X=x + 1
Print (img] [11: - 1)
Imglist2. Append (img] [11: - 1)

Print (imglist2)

For ig in imglist2:
X=x + 1
Print (ig)

# urllib. Request. Urlretrieve (' http://ppic.meituba.com:83/uploads3/181201/3-1Q20111553V11.jpg ', '% s.j pg % x)
# x + 1
# python download images to the local method
Ig image_url="https://icweiliimg6.pstatp.com/weili/l/" + + ". Webp "
# image_url2=""
# image_url=img
File_path='E:/new folder 4 images/pictures/creeper' + indexId + '/picture'

If not OS. Path. The exists (file_path) :
OS. Makedirs (file_path) # if the folder does not exist directly create a
File_suffix=OS. Path. Splitext (image_url) [1]
Print (file_suffix)
Filename='{} {}'. The format (file_path + STR (x), filetype) # splicing filename
# x=x + 1
Print (filename)
# urllib. Request. Urlretrieve (image_url, filename=filename) # using urllib. Request. Urltrieve method to download image that there may be 403 forbidden
In order to prevent 403 # region here to this way of output
Res=requests. Get (image_url)
With the open (file_path + STR (x) + 'JPG', 'wb) as f:
F.w rite (res) content)
Print (111)
# endregion
Except IOError as e:
1, print (e)

Except the Exception as e:
2, print (e)

Time. Sleep (2)

CodePudding user response:

Example of this is for you

CodePudding user response:

reference 1st floor dabingsou response:
is this example of
yeah, wrote a small example
  • Related