I've been trying to make a program that downloads pdfs from links that "auto generates" them and to rename these files but i fail miserably.
eg. link "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us"
When you enter it your browser (chrome - auto downloads with some stupid name) Vivaldi for eg. asks you if you want to save the file.
I have no problems with links that ends in .pdf but these ones are a headache for me.
I have to automate this process as I usually have thousands of pdfs to download like these and doing this manually would make me kill myself.
I've tried :
from pathlib import Path
import requests
filename = Path('test.pdf')
url = 'https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us'
response = requests.get(url)
print (response.status_code)
filename.write_bytes(response.content)
and
import urllib.request
urllib.request.urlretrieve(url, "filename.pdf")
But program just hangs and does nothing. Is there a way to download a pdf from link like that ?
CodePudding user response:
You will need to write your own os or python curl calls but that type of link can be captured in a short-term window, NOTE the pdf is auto generated with today's date thus it is NOT a stored pdf but a fresh Date: Wednesday, November 9, 2022
generation.
Basically you call first reference to respond with the Location:
then use the Location:=
in second call, may not always work in all cases but does for this source
curl -I --url "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=62035238&lang=en_us"|find "Location:"
use Location: as a variable (here by hand) also use the pid as means to track download name
curl -o 62035238.pdf --url "https://checkaproduct.se.com/Download.aspx?qs=elorc1pNdXN4OFpyd3VWc1liN2JRZytKemNQWVdoZkI0R3NMU0FscWFlcTlpVTM0eWVBUFJ0aDZLbC81RjJUR2wwSXA0SFplcjhCMTFIUEhqWHdiUE9Fb2g0TlRmQW5JK2N6akpOc0lCN2JGMlBlNml4a3IvS2ZRaUNtdHd4T2RjeFRsaEdPR0J4aUVIazdxZ2VrTDJnPT0%2%"
here is a try.bat to simply say get pid
set "pid=%1"
curl -I --url "https://checkaproduct.se.com/DistantRequestDispatcher.aspx?action=export&pid=%pid%&lang=en_us"|find "Location:">tmp.txt
for /F "usebackq tokens=1* delims= " %%a in (`type tmp.txt`) do curl -o %pid%.pdf --url "https://checkaproduct.se.com%%b"
start "" %pid%.pdf