Downloading pdf files from php server || saving not available files-CodePudding

I am trying to download the PDFs (a few can be word files, very rarely) located on a PHP server. It appears that on the server, the PDFs are numbered increasingly from 1 to 14000. The PDFs can be downloaded using the link: http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=X, where X is a number in the [1, 14000] range. I am using the following code for X = 200, which I can then loop over all the [1, 14000] values to save all the files in a specific folder. The code currently creates a pdf file with zero bytes size if the pdf doesn't exist, corresponding to an X value. I am using the following code to run a test on 20 X values for which pdfs do not exist.

import requests

urls = [('13980', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13980'),
        ('13981', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13981'),
        ('13982', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13982'),  
        ('13983', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13983'), 
        ('13984', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13984'), 
        ('13985', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13985'), 
        ('13986', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13986'), 
        ('13987', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13987'),
        ('13988', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13988'),
        ('13989', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13989'), 
        ('13990', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13990'), 
        ('13991', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13991'), 
        ('13992', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13992'), 
        ('13993', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13993'), 
        ('13994', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13994'), 
        ('13995', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13995'), 
        ('13996', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13996'), 
        ('13997', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13997'), 
        ('13998', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13998'), 
        ('13999', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13999'), 
        ('14000', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=14000')]

for number, url in urls:
    s = requests.Session()
    response = s.get(url)
    
    with open("/Users/aartimalik/Downloads/test/"   number   "_phptest.pdf", "wb") as f:
        f.write(response.content)
        f.close()

This code saves 0-byte pdfs because pdfs corresponding to those numbers do not exist. I want it to: save .pdf files only if there's a pdf file corresponding to an x file and return "no pdf file" if it doesn't exist...I'm not sure if it's possible with with open. Any help is appreciated. Thanks!

CodePudding user response：

The following worked (can modify it to include pdfs):

import requests
import os

os.chdir("/Users/aartimalik/Documents/GitHub/revenue_procurement/pdfs")

from phpurldoc import urls

print(urls)

for number, url in urls:
    s = requests.Session()
    response = s.get(url)
    h = response.headers["Content-Disposition"].split("=")[-1]

    if h[-1] == "x":
        with open("./bidsummaries-doc/"   h   "_"   number   ".docx", "wb") as f:
            f.write(response.content)
            f.close()

    else:
        with open("./bidsummaries-doc/"   h   "_"   number   ".doc", "wb") as f:
            f.write(response.content)
            f.close()