I'm working on a project where I need to use one large PDF file with 100,000's of images, where I need to insert a custom/variable barcode on every nth page (conditional dependant).
The contents of the barcode will change for every insertion, for this example, let's just say based on iteration.
I've used PyMuPDF to manipulate PDFs in the past, including inserting images. I've tested inserting barcodes when they're saved to file, and have no issues.
I've used Treepoem in the past to generate custom barcodes as required, on a much smaller scale.
(This is still in planning/proof of concept phase) So my concern is that if I'll be doing this at a larger scale, I'll be limited by disk read/write speeds.
I understand that python has a tempfile library, that I've never used. I'm attempting to leverage this to generate and save barcodes to tempfiles in memory, and then insert them into the PDF file from memory, rather than from disk/file.
I've tested and confirmed that generating a barcode and saving it to file allows me to insert into the PDF file as required. Below example:
import fitz
import treepoem
barcode_file = treepoem.generate_barcode(
barcode_type='datamatrixrectangular',
data='10000010'
).convert('1').save('barcode_file.jpg') # Convert('1') forces monochrome, reducing file size.
pdf_file = fitz.open() # Creating a new file for this example.
pdf_file.new_page() # Inserting a new blank page.
page = pdf_file[0]
rect = fitz.Rect(70, 155, 200, 230) # Generic area defined, required to insert barcode into. (x0, y0, x1, y1)
page.insert_image(rect, filename='barcode_file.jpg')
pdf_file.save('example_pdf_with_barcode.pdf')
When trying to implement tempfile to remove saving to file, I'm not sure where to utilise it.
I've tried creating a new tempfile object, inserting the barcode image into it.
import fitz
import tempfile
import treepoem
barcode_contents = treepoem.generate_barcode(
barcode_type='datamatrixrectangular',
data='10000010'
).convert('1')
barcode_tempfile = tempfile.TemporaryFile()
barcode_tempfile.write(b'{barcode_contents}') # Like f-string, with binary?
barcode_tempfile.seek(0) # Required, not understood.
pdf_file = fitz.open() # Creating a new file for this example.
pdf_file.new_page() # Inserting a new blank page.
page = pdf_file[0]
rect = fitz.Rect(70, 155, 200, 230) # Generic area defined, required to insert barcode into. (x0, y0, x1, y1)
page.insert_image(rect, filename=barcode_tempfile)
pdf_file.save('example_pdf_with_barcode.pdf')
Which returns a permission based error:
File "<redacted>\example.py", line 20, in <module>
page.insert_image(rect, filename=barcode_tempfile)
File "<redacted>\venv\Lib\site-packages\fitz\utils.py", line 352, in insert_image
xref, digests = page._insert_image(
^^^^^^^^^^^^^^^^^^^
File "<redacted>\venv\Lib\site-packages\fitz\fitz.py", line 6520, in _insert_image
return _fitz.Page__insert_image(self, filename, pixmap, stream, imask, clip, overlay, rotate, keep_proportion, oc, width, height, xref, alpha, _imgname, digests)
RuntimeError: cannot open <redacted>\AppData\Local\Temp\tmpr_98wni9: Permission denied
I've looked for said temp file in the specified directory, which can't be found. So I can't figure out how to trouble shoot this.
Treepoem's barcode generator also has a save() method, where you can typically save to file. I've tried to save to a tempfile instead, as below:
import fitz
import tempfile
import treepoem
treepoem.generate_barcode(
barcode_type='datamatrixrectangular',
data='10000010'
).convert('1').save(tempfile.TemporaryFile('barcode_tempfile'))
pdf_file = fitz.open() # Creating a new file for this example.
pdf_file.new_page() # Inserting a new blank page.
page = pdf_file[0]
rect = fitz.Rect(70, 155, 200, 230) # Generic area defined, required to insert barcode into. (x0, y0, x1, y1)
page.insert_image(rect, filename=barcode_tempfile)
pdf_file.save('example_pdf_with_barcode.pdf')
Which results in the below error:
File "<redacted>\example.py", line 8, in <module>
).convert('1').save(tempfile.TemporaryFile('barcode_tempfile'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<redacted>\AppData\Local\Programs\Python\Python311\Lib\tempfile.py", line 563, in NamedTemporaryFile
file = _io.open(dir, mode, buffering=buffering,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid mode: 'barcode_tempfile'
So I'm unsure if I can save to a tempfile via this method.
Would anyone be able to explain if this is possible, how best to tackle it?
(Currently using python 3.11)
Thanks,
CodePudding user response:
Your problems are in the area of tempfile handling. Instead of going into detail, I suggest to stick with Pillow and use its facilities exclusively:
- convert the PIL image to a JPEG / PNG as you did, but let PIL save to a memory file
- insert that memory image using PyMuPDF
import io # need this for memory output
fp = io.BytesIO() # memory binary file
treepoem.generate_barcode(
barcode_type='datamatrixrectangular',
data='10000010'
).convert('1').save(fp, "JPEG")) # write image to memory
# now insert image into page using PyMuPDF
# fp.getvalue() delivers the image content in memory
page.insert_image(rect, stream=fp.getvalue())