I want to realize the image of a web site to download, then to the site and all the images link to download a label, and so on, this is the code I wrote, consult bosses how depth do custom crawl
# - * - coding: utf-8 - * - The import scrapy from .. The items import ImgspiderItem
Full_img_list=[]
The class TestSpiderSpider (scrapy. Spiders) : Name='test_spider' # url=input (" please enter to crawl site: ") Allowed_domains=[' meishij.net '] Start_urls=[' https://www.meishij.net/china-food/caixi/chuancai/']
Def parse (self, response) : Img_list=response. Xpath ('//img/@ SRC '). The extract () A_list=response. Xpath ('//a/@ href). The extract ()
If img_list: The item=ImgspiderItem () For img in img_list: If img is not None: If img [4-0]! '=' HTTP: Img=+ img 'HTTPS:' Full_img_list. Append (img) Elif img [0:5]! :='HTTPS' Img='HTTPS:' + img. Split (' : ', 1) [1] Full_img_list. Append (img) The else: Full_img_list. Append (img) The item [' image_urls]=full_img_list Yield item For a in a_list: If a is not None: If a [4-0]! '=' HTTP: A='HTTPS:' + a Elif a [0:5]! :='HTTPS' A='HTTPS:' + a.s plit (' : ', 1) [1] Yield scrapy. Request ( A, The callback=self. Parse )