How do you obtain underlying failed request data when catching requests.exceptions.RetryError?-CodePudding

The bounty expires in 5 days. Answers to this question are eligible for a 400 reputation bounty. ely is looking for a canonical answer:

To get the bounty, an answer must provide a specific code snippet example showing how to extract the metadata from a response object detailing the underlying HTTP errors that lead to retries ultimately causing a RetryError. It should only rely on extracting data fields from pre-existing data structures or classes in relevant libraries like werkzeug or requests. For example, it should not involve creating a subclass to customize missing behavior from those libraries. It should not involve parsing the error description string of the RetryError - metadata fields like the status code and underlying exception message are needed definitively from the actual web response, not from attempts to parse them out of the RetryError description text.

I am using a somewhat standard pattern for putting retry behavior around requests requests in Python,

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    method_whitelist=HTTP_RETRY_METHODS,
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

...

try:
    response = http.get(... some request params ...)
except requests.Exceptions.RetryError as err:
    # Do logic with err to perform error handling & logging.

Unfortunately the docs on RetryError don't explain anything and when I intercept the exception object as above, err.response is None. While you can call str(err) to get the message string of the exception, this would require unreasonable string parsing to attempt to recover the specific response details and even if one is willing to try that, the message actually elides the necessary details. For example, one such response from a deliberate call on a site giving 400s (not that you would really retry on this but just for debugging) gives a message of "(Caused by ResponseError('too many 400 error responses'))" - which elides the actual response details, like the requested site's own description text for the nature of the 400 error (which could be critical to determining handling, or even just to pass back for logging the error).

What I want to do is receive the response for the last unsuccessful retry attempt and use the status code and description of that specific failure to determine the handling logic. Even though I want to make it robust behind retries, I still need to know the underlying failure beyond "too many retries" when ultimately handling the error.

Is it possible to extract this information from the exception raised for retries?

CodePudding user response：

It's not directly supported by the libraries:

Retry does not attach response to MaxRetryError: urllib3/util/retry.py#L486-L512
Similar issue raised in psf/requests repo: no response object captured in error #4455

It's possible to achieve by subclassing Retry to attach response to MaxRetryError:

from requests.adapters import MaxRetryError, Retry


class MyRetry(Retry):

    def increment(self, *args, **kwargs):
        try:
            return super().increment(*args, **kwargs)
        except MaxRetryError as ex:
            response = kwargs.get('response')
            if response:
                response.read(cache_content=True)
                ex.response = response
            raise

Usage:

# retry_strategy = Retry(
retry_strategy = MyRetry(

# Do logic with err to perform error handling & logging.
print(err.args[0].response.status)
print(err.args[0].response.data)

CodePudding user response：

As already indicated by aaron, the actual error that you are trying to catch and the one that is being raised by the library are not the same. Also this heavily depends on the version of library used as it seems they changed things around with the Retry method as well (It is also available from from requests.adapters import Retry including the RetryError).

Working Code

For the following code tested on requests=2.27.1 and python=3.7.12 and Retry from urlib3 as you used it:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


HTTP_RETRY_LIMIT = 1
HTTP_RETRY_CODES = [403, 400, 401, 429, 500, 502, 503, 504]
HTTP_RETRY_METHODS = ['HEAD', 'GET', 'OPTIONS', 'TRACE', 'POST']
HTTP_BACKOFF_FACTOR = 1

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    allowed_methods=HTTP_RETRY_METHODS, # changed to allowed_methods
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(err)
    print(err.args[0].reason)

I did get output of

requests.exceptions.RetryError: HTTPSConnectionPool(host='www.howtogeek.com', port=443): Max retries exceeded with url: /wp-content/uploads/2018/06/ (Caused by ResponseError('too many 403 error responses'))
too many 403 error responses

Alternative with sys.exc_info()

If this isn't enough, you can check importing traceback package or using sys.exc_info() (indexing 0, 1 or 2), check more on this stackoverflow. In your case you would do something like:

import traceback, sys
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(sys.exc_info()[0]) # just the class of the exception, check the link for more info

Which returns class, which you might use to error handle, this can be also combined with catching the generic Exception

<class 'requests.exceptions.ConnectionError'>

This gives you a lot of control, as you can do info = sys.exc_info()[1] and obtain the actual object. Thus you can access with the following:

print(info.request.url)
print(info.request.headers)
# and probably most important for you
print(info.args[0].reason) # urllib3.exceptions.ResponseError('too many 403 error responses')

And obtain the resulting info you require:

https://www.howtogeek.com/wp-content/uploads/2018/06/
{'User-Agent': 'python-requests/2.27.1', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
too many 403 error responses

The alternative even more information with full traceback (depends on parsing):

print(traceback.format_exc()) # Returns full stack trace, might not be most useful in your case