Home > Net >  Scrapy didn't return result
Scrapy didn't return result

Time:07-16

I have this code:

import scrapy

class AstroSpider(scrapy.Spider):
    name = "Astro"
    allowed_domains = ['www.astrolighting.com']
    start_urls = ['https://www.astrolighting.com/products']



def parse(self, response, **kwargs):
    for link in response.css('article.product-listing-item a::attr(href)'):
        yield response.follow(link.get(), callback=self.parse_items)

def parse_items(self, response):
    
    for link in response.css('div.variants.variants--large a::attr(href)'):
        yield response.follow(link.get(), callback=self.parse_item)

def parse_item(self, response):
    print(f"!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    yield {
        'name': response.css('div.detail__right h1::text').get(),
        'material': response.css('div.detail__right p span::text').getall()[0],
        'id': response.css('div.detail__right p span::text').getall()[1].strip()
    }

So, the results of parsing is just empty. Why? It seems like function "parse_item" never evaluated

CodePudding user response:

I didn't test your code but according to @alexpdev you need to pass dont_filter to True


class AstroSpider(scrapy.Spider):
    name = "Astro"
    allowed_domains = ['www.astrolighting.com']
    start_urls = ['https://www.astrolighting.com/products']



def parse(self, response, **kwargs):
    for link in response.css('article.product-listing-item a::attr(href)'):
        yield response.follow(link.get(), callback=self.parse_items,dont_filter=True)

def parse_items(self, response):
    
    for link in response.css('div.variants.variants--large a::attr(href)'):
        yield response.follow(link.get(), callback=self.parse_item,dont_filter=True)

def parse_item(self, response):
    print(f"!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    yield {
        'name': response.css('div.detail__right h1::text').get(),
        'material': response.css('div.detail__right p span::text').getall()[0],
        'id': response.css('div.detail__right p span::text').getall()[1].strip()
    }

but I suggest checking parse_items links by

print(response.css('div.variants.variants--large a::attr(href)').getall())

CodePudding user response:

It is because the parse_item method is never evaluated.

After walking through each step of your code, and the urls, I discovered that the link url extracted from the parse method is an identical match to the link url extraced from the parse_items method.

Scrapy by default filters urls that it has already visited, so when it encounters the same url in the request with the parse_item callback it ignores it as a duplicate.

  • Related