I want to check the response status and export it to CSV file using Scrapy. I tried with response.status
but it only shows '200' and exports to the CSV file. How to get other status codes like "404", "502" etc.
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}
CodePudding user response:
In your settings you can adjust these to make sure certain error codes are not automatically filtered by scrapy.
HTTPERROR_ALLOWED_CODES
Default: []
Pass all responses with non-200 status codes contained in this list.
HTTPERROR_ALLOW_ALL
Default: False
Pass all responses, regardless of its status code.
settings.py
HTTPERROR_ALLOW_ALL = True
HTTPERROR_ALLOWED_CODES = [500, 501, 404 ...]
CodePudding user response:
You can add an errback
to the request and then catch the http error in the errback
function and yield the required information. Get more information about the errback function in the docs. See sample below
import scrapy
from scrapy.spidermiddlewares.httperror import HttpError
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['example.com']
def start_requests(self):
yield scrapy.Request(url="https://example.com/error", errback=self.parse_error)
def parse_error(self, failure):
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
yield {
'URL': response.url,
'Status': response.status
}
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}