Home > OS >  Download PDF from link that auto generates it in Python
Download PDF from link that auto generates it in Python

Time:11-10

I've been trying to make a program that downloads pdfs from links that "auto generates" them and to rename these files but i fail miserably.
eg. link "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us"
When you enter it your browser (chrome - auto downloads with some stupid name) Vivaldi for eg. asks you if you want to save the file. I have no problems with links that ends in .pdf but these ones are a headache for me. I have to automate this process as I usually have thousands of pdfs to download like these and doing this manually would make me kill myself.
I've tried :

from pathlib import Path
import requests

filename = Path('test.pdf')
url = 'https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us'
response = requests.get(url)
print (response.status_code)
filename.write_bytes(response.content)

and

import urllib.request
urllib.request.urlretrieve(url, "filename.pdf")

But program just hangs and does nothing. Is there a way to download a pdf from link like that ?

CodePudding user response:

You will need to write your own os or python curl calls but that type of link can be captured in a short-term window, NOTE the pdf is auto generated with today's date thus it is NOT a stored pdf but a fresh Date: Wednesday, November 9, 2022 generation.

Basically you call first reference to respond with the Location: then use the Location:= in second call, may not always work in all cases but does for this source

curl -I --url "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us"|find "Location:"

use Location: as a variable (here by hand) also use the pid as means to track download name

curl -o 62035238.pdf --url "https://checkaproduct.se.com/Download.aspx?qs=elorc1pNdXN4OFpyd3VWc1liN2JRZytKemNQWVdoZkI0R3NMU0FscWFlcTlpVTM0eWVBUFJ0aDZLbC81RjJUR2wwSXA0SFplcjhCMTFIUEhqWHdiUE9Fb2g0TlRmQW5JK2N6akpOc0lCN2JGMlBlNml4a3IvS2ZRaUNtdHd4T2RjeFRsaEdPR0J4aUVIazdxZ2VrTDJnPT0%2%"

here is a try.bat to simply say get pid

set "pid=%1"

curl -I --url "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=%pid%&lang=en_us"|find "Location:">tmp.txt

for /F "usebackq tokens=1* delims= " %%a in (`type tmp.txt`) do curl -o %pid%.pdf --url "https://checkaproduct.se.com%%b"

start "" %pid%.pdf

enter image description here

  • Related