I am checking a bunch of website response statuses and exporting them to a CSV file. There are a couple of websites having DNSLookupError
or NO WEBSITE FOUND and not storing anything in the CSV file. How can I also store the DNSLookupError
message to the CSV along with the URL?
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}
CodePudding user response:
You can use the errback
function to catch DNS errors or any other types of errors. See below sample usage.
import scrapy
from twisted.internet.error import DNSLookupError
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['example.com']
def start_requests(self):
yield scrapy.Request(url="http://example.com/error", errback=self.parse_error)
def parse_error(self, failure):
if failure.check(DNSLookupError):
# this is the original request
request = failure.request
yield {
'URL': request.url,
'Status': failure.value
}
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}