Scrapy achieve depth crawl-CodePudding

I want to realize the image of a web site to download, then to the site and all the images link to download a label, and so on, this is the code I wrote, consult bosses how depth do custom crawl

 
# - * - coding: utf-8 - * - 
The import scrapy 
from .. The items import ImgspiderItem 

Full_img_list=[] 

The class TestSpiderSpider (scrapy. Spiders) : 
Name='test_spider' 
# url=input (" please enter to crawl site: ") 
Allowed_domains=[' meishij.net '] 
Start_urls=[' https://www.meishij.net/china-food/caixi/chuancai/'] 

Def parse (self, response) : 
Img_list=response. Xpath ('//img/@ SRC '). The extract () 
A_list=response. Xpath ('//a/@ href). The extract () 

If img_list: 
The item=ImgspiderItem () 
For img in img_list: 
If img is not None: 
If img [4-0]! '=' HTTP: 
Img=+ img 'HTTPS:' 
Full_img_list. Append (img) 
Elif img [0:5]! :='HTTPS' 
Img='HTTPS:' + img. Split (' : ', 1) [1] 
Full_img_list. Append (img) 
The else: 
Full_img_list. Append (img) 
The item [' image_urls]=full_img_list 
Yield item 
For a in a_list: 
If a is not None: 
If a [4-0]! '=' HTTP: 
A='HTTPS:' + a 
Elif a [0:5]! :='HTTPS' 
A='HTTPS:' + a.s plit (' : ', 1) [1] 
Yield scrapy. Request (
A, 
The callback=self. Parse 
)

CodePudding user response:

refer to the original poster weixin_43137680 response:

I want to realize the image of a web site to download, then to the site and all the images link to download a label, and so on, this is the code I wrote, consult bosses how depth do custom crawl

 
# - * - coding: utf-8 - * - 
The import scrapy 
from .. The items import ImgspiderItem 

Full_img_list=[] 

The class TestSpiderSpider (scrapy. Spiders) : 
Name='test_spider' 
# url=input (" please enter to crawl site: ") 
Allowed_domains=[' meishij.net '] 
Start_urls=[' https://www.meishij.net/china-food/caixi/chuancai/'] 

Def parse (self, response) : 
Img_list=response. Xpath ('//img/@ SRC '). The extract () 
A_list=response. Xpath ('//a/@ href). The extract () 

If img_list: 
The item=ImgspiderItem () 
For img in img_list: 
If img is not None: 
If img [4-0]! '=' HTTP: 
Img=+ img 'HTTPS:' 
Full_img_list. Append (img) 
Elif img [0:5]! :='HTTPS' 
Img='HTTPS:' + img. Split (' : ', 1) [1] 
Full_img_list. Append (img) 
The else: 
Full_img_list. Append (img) 
The item [' image_urls]=full_img_list 
Yield item 
For a in a_list: 
If a is not None: 
If a [4-0]! '=' HTTP: 
A='HTTPS:' + a 
Elif a [0:5]! :='HTTPS' 
A='HTTPS:' + a.s plit (' : ', 1) [1] 
Yield scrapy. Request (
A, 
The callback=self. Parse 
)

To define a function, at the same time, change the last line of the parse function callback parameter, it points to a new function, you can be the next step is the function of the operation, the new function will be a this URL into, as its URL