I have a list of javascript URLs and I want to check if they are exiting on a remote server and not giving me 404 not found. some times also provide a 200k status code but show a 404 not found page because the file doesn't exist.
wget $url
works fine for me but I want to download the file I just want to check if the file can be downloaded.
I tried wget --spider $url
but that didn't work for many URLs such as wget --spider https://www.codecademy.com/cdn-cgi/challenge-platform/h/g/scripts/alpha/invisible.js
the output gives me 404 not found. broken link!!!
even though when I access the file in my browser I can see the javascript content!
How can I check if a javascript file can be downloaded or not without downloading it on my machine at all?
Edit: I tried many things but somehow this URL https://www.codecademy.com/cdn-cgi/challenge-platform/h/g/scripts/alpha/invisible.js
works fine in the browser but When I try to check if it exists or not using any bash or python code it gives me a 404 status code.
CodePudding user response:
You can use python urllib module to check file existence in a given URL. Here's a sample code. The results are saved inside the "results" dictionary.
from urllib.request import urlopen
urls = ['https://www.codecademy.com/cdn-cgi/challenge- platform/h/g/scripts/alpha/invisible.js','https://www.codecademy.com/idontexist.js']
results={}
for url in urls:
try:
ret = urlopen(url)
if ret.code == 200:
state="Exists"
print(url ' : {}'.format(state))
except:
state="Not exists"
print(url ' : {}'.format(state))
results[url]=state