TODO: Convert a TIFF file from URL into PDF
I have a .tiff based file that I need to download/get from a url. When trying to save/download that file into cwd, it is downloaded, however, I am unable to open it.
1. Command used to download the file into cwd:
import urllib.request
sample_tiff_url = "https://www.gati.com/viewPOD2.jsp?dktno=322012982"
urllib.request.urlretrieve(sample_tiff_url, "check.tiff")
My reasoning for downloading it was that I'll download it in local and then convert it into pdf using
print(type(resp.content))
>>>bytes
3. Another thing I tried is;
import img2pdf
import base64
img_content = base64.b64decode(resp.content)
content = img2pdf.convert(img_content)
which gives following error:
ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x7ff46368c410>
Along with this;
from PIL import Image
import io
pil_bytes = io.BytesIO(resp.content)
pil_image = Image.open(pil_bytes)
Error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7ff4642e2650>
4. Lastly;
import requests
from PyPDF2 import PdfFileMerger, PdfFileReader
sample_tiff_url = ""https://www.gati.com/viewPOD2.jsp?dktno=322012982""
resp = requests.get(sample_tiff_url,stream=True)
PdfFileReader(resp.content)
TBH, I have not worked with image libraries and files so I am not understanding all the errors that I get.
TLDR; Either download the .tiff file into local or how to read the contents from that URL giving bytes type data and convert/write it into a PDF.
CodePudding user response:
import requests
import io
from PIL import Image
url = 'https://www.gati.com/viewPOD2.jsp?dktno=322012982'
r = requests.get(url)
pil_bytes = io.BytesIO(r.content)
pil_image = Image.open(pil_bytes)
# Needed to get around ValueError: cannot save mode RGBA
rgb = Image.new('RGB', pil_image.size)
rgb.paste(pil_image)
rgb.save('downloaded_image.pdf', 'PDF')