How to try accessing webpage again after http errors in python for loop-CodePudding

I am running the following for loop which goes through a dictionary of subreddits(key) and urls (value). The urls produce a dictionary with all posts from 2022 of a given subreddit. Sometimes the for loop stops and produces a 'http error 525' or other errors.

I'm wondering how I can check for these errors when reading the url and then try again until the error is not given before moving to the next subreddit.

for subredd, url in dict_last_subreddit_posts.items(): 
    print(subredd)
    page = urllib.request.urlopen(url).read()
    dict_last_posts[subredd] = page

I haven't been able to figure it out.

CodePudding user response：

You can put this code in try and except block like this:

for subredd, url in dict_last_subreddit_posts.items(): 
    print(subredd)
    while True:
        try:
            page = urllib.request.urlopen(url).read()
            dict_last_posts[subredd] = page
            break # exit the while loop if the request succeeded
        except urllib.error.HTTPError as e:
            if e.code == 525 or e.code == 522 or e.code == 504:
                print("Encountered HTTP error while reading URL. Retrying...")
            else:
                raise  # re-raise the exception if it's a different error

This code will catch any HTTP Error that occurs while reading the URL and check if the error code is 525 or 504 or 525. If it is, it will print a message and try reading the URL again. If it's a different error, it will re-raise the exception so that you can handle it appropriately.

NOTE: This code will retry reading the URL indefinitely until it succeeds or a different error occurs. You may want to add a counter or a timeout to prevent the loop from going on forever in case the error persists.

CodePudding user response：

It's unwise to indefinitely retry a request. Set a limit even if it's very high, but don't set it so high that it causes you to be rate limited (HTTP status 429). The backoff_factor will also have an impact on rate limiting.

Use the requests package for this. This makes it very easy to set a custom adapter for all of your requests via Session, and it includes Retry from urllib3 which takes care of retry behavior in an object you can pass to your adapter.

import requests
from requests.adapters import HTTPAdapter, Retry

s = requests.Session()
retries = Retry(
    total=5,
    backoff_factor=0.1,
    status_forcelist=[504, 522, 525]
)
s.mount('https://', HTTPAdapter(max_retries=retries))

for subredd, url in dict_last_subreddit_posts.items(): 
    response = s.get(url)
    dict_last_posts[subredd] = response.content

You can play around with total (maximum number of retries) and backoff_factor (adjusts wait time between retries) to get the behavior you want.

CodePudding user response：

Try something like this:

for subredd, url in dict_last_subreddit_posts.items():
    print(subredd)
    http_response = urllib.request.urlopen(url)
    while http_response.status != 200:
        if http_response.status == 503:
            http_response = urllib.request.urlopen(url)
        elif http_response.status == 523:
            #enter code here
        else:
            #enter code here
    dict_last_posts[subredd] = http_response.read()

But, Michael Ruth answer is better