Home > Software design >  Python Scrapy - scrapy.Request not working
Python Scrapy - scrapy.Request not working

Time:12-07

My scraper crawls 0 pages and I think the problem resides in the last line of code in the parse method:

def parse(self, response):
        all_companies = response.xpath('//header[@class = "card-header"]')

        for company in all_companies:
            company_url = company.xpath('./a[@class = "card-header-scorecard"]/@href').extract_first()
            yield scrapy.Request(url=company_url, callback = self.parse_company)

I tested the retrieval of the company_url with the scraps shell and they are all returned correctly. The scraper accesses each of those urls and scrapes the items using the parse_company method.

Before using yield I was using the Rule feature and it worked perfectly together with parse_company so I know this method works, however I had to change my approach out of necessity.

rules = (
    Rule(LinkExtractor(restrict_css=".card-header > a"), callback="parse_company")
)

CodePudding user response:

You are using CrawlSpider and in latest versions of scrapy CrawlSpider's default callback is _parse instead of parse. If you want to override default callback then use _parse or you can use scrapy.Spider instead of scrapy.CrawlSpider

  • Related