Home > Software design >  Is there an automatic way to stop scrapy crawler when it results some errors?
Is there an automatic way to stop scrapy crawler when it results some errors?

Time:11-11

In general, I run my scrapy cralwer using the following command:

scrapy crawl <sipder_name>

after running, it crawls the desired elements from target resource, but I have to monitor the results showed on the screen to find errors(if any) and stop the crawler manually.

How can I automate this procedure? Is there an automatic way to stop the crawler when it can't crawl a desired element and failed on fetching that?

CodePudding user response:

spider.py:

import scrapy
from scrapy.exceptions import CloseSpider


class SomeSpider(scrapy.Spider):
    name = 'somespider'

    allowed_domains = ['example.com']
    start_urls = ['https://example.com']


    def parse(self, response):
        try:
            something()
        except Exception as e:
            print(e)
            raise CloseSpider("Some error")   
        # if you want to catch a bad status you can also do:
        # if response.status != 200: .....

CodePudding user response:

I think you are looking for logging. There is the documentation for logging here.

I find useful to use:

import logging
import scrapy

logger = logging.getLogger('mycustomlogger')

class MySpider(scrapy.Spider):

    name = 'myspider'
    start_urls = ['https://scrapy.org']

    def parse(self, response):
        logger.info('Parse function called on %s', response.url)
  • Related