How to check HTTP status of a file online without fully downloading the file?-CodePudding

I have a database of thousands of files online, and I want to check what their status is (e.g. if the file exists, if it sends us to a 404, etc.) and update this in my database.

I've used urllib.request to download files to a python script. However, obviously downloading terabytes of files is going to take a long time. Parallelizing the process would help, but ultimately I just don't want to download all the data, just check the status. Is there an ideal way to check (using urllib or another package) the HTTP response code of a certain URL?

Additionally, if I can get the file size from the server (which would be in the HTTP response), then I can also update this in my database.

CodePudding user response：

If your web server is standards-based, you can use a HEAD request instead of a GET. It returns the same status without actually fetching the page.

CodePudding user response：

The requests module can check the status response of a request. Just do:

import requests

url = 'https://www.google.com'  # Change to your link
response = requests.get(url)
print(response.status_code)

this code shows me 200, so the request has been successful