Amateur Python developer here. I'm working on a project where I take multiple PDfs, each one with varying amounts of pages(1-20ish), and turn them into PNG files to use with pytesseract later.
I'm using pdf2image and poppler on a test pdf that has 3 pages. The problem is that it only converts the last page of the PDF to a PNG. I thought "maybe the program is making the same file name for each pdf page, and with each iteration it rewrites the file until only the last pdf page remains" So I tried to write the program so it would change the file name with each iteration. Here's the code.
from pdf2image import convert_from_path
images = convert_from_path('/Users/jacobpatty/vscode_projects/badger_colors/test_ai/10254_Craigs_Plumbing.pdf', 200)
file_name = 'ping_from_ai_test.png'
file_number = 0
for image in images:
file_number = 1
file_name = 'ping_from_ai_test' str(file_number) '.png'
image.save(file_name)
This failed in 2 ways. It only made 2 png files('ping_from_ai_test.png' and 'ping_from_ai_test1.png') instead of 3, and when I clicked on the png files they were both just the last pdf page again. I don't know what to do at this point, any ideas?
CodePudding user response:
Your code is only outputting a single file as far as I can see. The problem is that you have a typo in your code.
The line
file_number = 1
is actually an assignment:
file_number = ( 1)
This should probably be
file_number = 1
CodePudding user response:
try this instead of doing for image in images:
for n in range(len(images)):
images[n].save('test' str(n) '.png')
Does that work?