How to handle DNSLookupError in Scrapy?-CodePudding

I am checking a bunch of website response statuses and exporting them to a CSV file. There are a couple of websites having DNSLookupError or NO WEBSITE FOUND and not storing anything in the CSV file. How can I also store the DNSLookupError message to the CSV along with the URL?

def parse(self, response):
    yield {
        'URL': response.url,
        'Status': response.status
    }

CodePudding user response：

You can use the errback function to catch DNS errors or any other types of errors. See below sample usage.

import scrapy
from twisted.internet.error import DNSLookupError


class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = ['example.com']

    def start_requests(self):
        yield scrapy.Request(url="http://example.com/error", errback=self.parse_error)

    def parse_error(self, failure):
        if failure.check(DNSLookupError):
            # this is the original request
            request = failure.request
            yield {
                'URL': request.url,
                'Status': failure.value
            }


    def parse(self, response):
        yield {
            'URL': response.url,
            'Status': response.status
        }