I have a database of thousands of files online, and I want to check what their status is (e.g. if the file exists, if it sends us to a 404, etc.) and update this in my database.
I've used urllib.request
to download files to a python script. However, obviously downloading terabytes of files is going to take a long time. Parallelizing the process would help, but ultimately I just don't want to download all the data, just check the status. Is there an ideal way to check (using urllib
or another package) the HTTP response code of a certain URL?
Additionally, if I can get the file size from the server (which would be in the HTTP response), then I can also update this in my database.
CodePudding user response:
If your web server is standards-based, you can use a HEAD request instead of a GET. It returns the same status without actually fetching the page.
CodePudding user response:
The requests module can check the status response of a request. Just do:
import requests
url = 'https://www.google.com' # Change to your link
response = requests.get(url)
print(response.status_code)
this code shows me 200, so the request has been successful