I have a base64 string and I need to read it with a Python library. I can do that with the following steps:
- Decode the PDF in base64
- Save it into a new file
- Read it with libraries like PyPDF2
But since I can't create a new file, I need to read it using another process. I tried using the BufferedWriter class, that is part of the io
library but I believe that it is not the right way.
Edit 1
I can't create new files because I will be running the code in a serverless API host. And what I need to do is get the Base64 string and read it in a way that I can split each page into a new file and then save those files into a blob storage (but the split and save part are easy, the problem is the "read Base64 string without creating a new file").
CodePudding user response:
PDF is a binary file format, not a base64 string. Base64 is a way of encoding binary data as ASCII text.
What you need to do is decode the base64 string with base64.b64decode into a byte array, then use a PDF library like PyPDF2 to read that byte array either directly or through a BytesIO object :
import base64
import io
from PyPDF2 import PdfReader
bytes=base64.b64decode(thatString)
f=io.BytesIO(bytes)
reader = PdfReader(f)
page = reader.pages[0]