In general, I run my scrapy cralwer using the following command:
scrapy crawl <sipder_name>
after running, it crawls the desired elements from target resource, but I have to monitor the results showed on the screen to find errors(if any) and stop the crawler manually.
How can I automate this procedure? Is there an automatic way to stop the crawler when it can't crawl a desired element and failed on fetching that?
CodePudding user response:
spider.py:
import scrapy
from scrapy.exceptions import CloseSpider
class SomeSpider(scrapy.Spider):
name = 'somespider'
allowed_domains = ['example.com']
start_urls = ['https://example.com']
def parse(self, response):
try:
something()
except Exception as e:
print(e)
raise CloseSpider("Some error")
# if you want to catch a bad status you can also do:
# if response.status != 200: .....
CodePudding user response:
I think you are looking for logging. There is the documentation for logging here.
I find useful to use:
import logging
import scrapy
logger = logging.getLogger('mycustomlogger')
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://scrapy.org']
def parse(self, response):
logger.info('Parse function called on %s', response.url)