Home > other >  Scrapy achieve depth crawl
Scrapy achieve depth crawl

Time:09-21

I want to realize the image of a web site to download, then to the site and all the images link to download a label, and so on, this is the code I wrote, consult bosses how depth do custom crawl

 
# - * - coding: utf-8 - * -
The import scrapy
from .. The items import ImgspiderItem

Full_img_list=[]

The class TestSpiderSpider (scrapy. Spiders) :
Name='test_spider'
# url=input (" please enter to crawl site: ")
Allowed_domains=[' meishij.net ']
Start_urls=[' https://www.meishij.net/china-food/caixi/chuancai/']

Def parse (self, response) :
Img_list=response. Xpath ('//img/@ SRC '). The extract ()
A_list=response. Xpath ('//a/@ href). The extract ()

If img_list:
The item=ImgspiderItem ()
For img in img_list:
If img is not None:
If img [4-0]! '=' HTTP:
Img=+ img 'HTTPS:'
Full_img_list. Append (img)
Elif img [0:5]! :='HTTPS'
Img='HTTPS:' + img. Split (' : ', 1) [1]
Full_img_list. Append (img)
The else:
Full_img_list. Append (img)
The item [' image_urls]=full_img_list
Yield item
For a in a_list:
If a is not None:
If a [4-0]! '=' HTTP:
A='HTTPS:' + a
Elif a [0:5]! :='HTTPS'
A='HTTPS:' + a.s plit (' : ', 1) [1]
Yield scrapy. Request (
A,
The callback=self. Parse
)

CodePudding user response:

refer to the original poster weixin_43137680 response:
I want to realize the image of a web site to download, then to the site and all the images link to download a label, and so on, this is the code I wrote, consult bosses how depth do custom crawl

 
# - * - coding: utf-8 - * -
The import scrapy
from .. The items import ImgspiderItem

Full_img_list=[]

The class TestSpiderSpider (scrapy. Spiders) :
Name='test_spider'
# url=input (" please enter to crawl site: ")
Allowed_domains=[' meishij.net ']
Start_urls=[' https://www.meishij.net/china-food/caixi/chuancai/']

Def parse (self, response) :
Img_list=response. Xpath ('//img/@ SRC '). The extract ()
A_list=response. Xpath ('//a/@ href). The extract ()

If img_list:
The item=ImgspiderItem ()
For img in img_list:
If img is not None:
If img [4-0]! '=' HTTP:
Img=+ img 'HTTPS:'
Full_img_list. Append (img)
Elif img [0:5]! :='HTTPS'
Img='HTTPS:' + img. Split (' : ', 1) [1]
Full_img_list. Append (img)
The else:
Full_img_list. Append (img)
The item [' image_urls]=full_img_list
Yield item
For a in a_list:
If a is not None:
If a [4-0]! '=' HTTP:
A='HTTPS:' + a
Elif a [0:5]! :='HTTPS'
A='HTTPS:' + a.s plit (' : ', 1) [1]
Yield scrapy. Request (
A,
The callback=self. Parse
)


To define a function, at the same time, change the last line of the parse function callback parameter, it points to a new function, you can be the next step is the function of the operation, the new function will be a this URL into, as its URL
  • Related