For some reason I can't get this to download and the content type is html\text and not pdf. The downloaded pdf file is always very small and when I try to open it the file is corrupted.
import request
docketnumber='1'
r = requests.get('https://cases.stretto.com/public/X070/10255/PLEADINGS/1025505242280000000049.pdf', allow_redirects=True, headers={'User-Agent': 'Mozilla/5.0'})
print(r.headers.get('content-type'))
open('C:/MyDownloads/' docketnumber ".pdf", 'wb' ).write(r.content)```
CodePudding user response:
Try to change User-Agent
:
import requests
r = requests.get(
"https://cases.stretto.com/public/X070/10255/PLEADINGS/1025505242280000000049.pdf",
allow_redirects=True,
headers={
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0"
},
)
with open("1.pdf", "wb") as f_out:
f_out.write(r.content)
Saves 1.pdf
:
andrej@andrej:~$ ls -alF 1.pdf
-rw-r--r-- 1 root root 243976 máj 30 23:03 1.pdf
CodePudding user response:
Andrej has the correct answer above but if you want a single OS line:-
curl -A "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0" -O https://cases.stretto.com/public/X070/10255/PLEADINGS/1025505242280000000049.pdf
result
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 238k 100 238k 0 0 295k 0 --:--:-- --:--:-- --:--:-- 295k
>1025505242280000000049.pdf