Home > Enterprise >  Return non-zero exit code when raising a scrapy.exceptions.UsageError exception
Return non-zero exit code when raising a scrapy.exceptions.UsageError exception


I have a Scrapy script which looks like this:


import os
import argparse
import datetime
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from spiders.mySpider import MySpider

parser = argparse.ArgumentParser(description='My Scrapper')
                    help='Verbose mode',

args = parser.parse_args()

if args.type != 'expected':
    parser.error("Wrong type")

if __name__ == "__main__":
    settings = get_project_settings()
    settings['LOG_ENABLED'] = args.verbose
    process = CrawlerProcess(settings=settings)
    process.crawl(MySpider, type_arg=args.type)


from scrapy import Spider
from scrapy.http import Request, FormRequest
import scrapy.exceptions as ScrapyExceptions

class MySpider(Spider):
    name = 'MyScrapper'
    allowed_domains = ['www.webtoscrape.com']
    start_urls = ['http://www.webtoscrape.com/path/to/page.html']

    def parse(self, response):
        # ...
        # Some logic
        # ...

        if condition:
            raise ScrapyExceptions.UsageError(reason="Wrong argument")

When I raise a parser.error() on the main.py file, my process returns a non-zero exit code as expected. However, when I raise a scrapy.exceptions.UsageError() on the mySpider.py file, I receive a 0 exit code, so the Jenkins pipeline step I run my script on thinks it has succeded and continues with the pipeline execution. I run my script with a python3 main.py --type my_type command.

Why the script execution doesn't notice that the usage error raised on the mySpider.py module should return a non-zero exit code?

CodePudding user response:

After several hours of trying approaches and reading this issue, the problem was that Scrapy does not use a non-zero exit code when a scrape fails. I managed to fix this behaviour by using the Crawler stats collection.


if __name__ == "__main__":
    settings = get_project_settings()
    settings['LOG_ENABLED'] = args.verbose
    process = CrawlerProcess(settings=settings)
    process.crawl(MySpider, type_arg=args.type)
    crawler = list(process.crawlers)[0]

    failed = crawler.stats.get_value('custom/failed_job')
    if failed:


class MySpider(Spider):
    name = 'MyScrapper'
    allowed_domains = ['www.webtoscrape.com']
    start_urls = ['http://www.webtoscrape.com/path/to/page.html']

    def parse(self, response):
        # ...
        # Some logic
        # ...

        if condition:
            self.crawler.stats.set_value('custom/failed_job', 'True')
            raise ScrapyExceptions.UsageError(reason="Wrong argument")
  • Related