Hey guys I'm new here,
Program explanation: I'm currently working on a python program that unzips folders which include PDFs to an "temp" folder. It then splits the pages of the PDFs to single page PDFs and sorts them into folders on another path ("/Georeferenzieren/0; /Georeferenzieren/1; ...) depending on the page number. For this specific part of the code I followed this guys tutorial.
Problem: That all works perfectly fine, but when I try to delete the temp folder, an error is displayed that the first file of the folder is still being used by another process.
(ger)
PermissionError: [WinError 32] Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird: 'e:\Intern\Programmieren\Python for Work\Testumgebung\temp\20220524_0109_V01_Auskunft_01_A3_H.pdf'
(en)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'e:\Intern\Programmieren\Python for Work\Testumgebung\temp\20220524_0109_V01_Auskunft_01_A3_H.pdf'
What I tried:
- Restarting PC to make sure no program uses the PDF. I think the error still occured because the PDF is opend by the python file.
- adding
pdf.close()
(pls see code below) - adding
Pdf.merger.close()
, like this guy suggested, which didn't work since I'm not using the merger. (pls see code below)
Code:
#variables from the code befor this passage
run = 0
tempPath = "E:\Intern\Programmieren\Python for Work\Testumgebung\temp"
geoPath = "E:\Intern\Programmieren\Python for Work\Testumgebung\Georeferenzieren"
#splitting up PDFs to singel Page PDFs
#https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs")
file2pages = {
0: [0,1], 1: [1,2], 2: [2,3], 3: [3,4], 4: [4,5],
}
for root, directories, file in os.walk(tempPath):
for file in file:
run = run 1
filePath = os.path.join(tempPath, file)
pdf = Pdf.open(filePath)
newPdfFiles = [Pdf.new() for i in file2pages]
newPdfIndex = 0
for n, page in enumerate(pdf.pages):
if n in list(range(*file2pages[newPdfIndex])):
newPdfFiles[newPdfIndex].pages.append(page)
else:
# make a unique filename based on original file name plus the index
pdfPath = os.path.join(geoPath, str(newPdfIndex), file)
outputFilename = f"{pdfPath}-{newPdfIndex}.pdf"
# save the PDF file
newPdfFiles[newPdfIndex].save(outputFilename)
newPdfIndex = 1
# add the `n` page to the `newPdfIndex` file
newPdfFiles[newPdfIndex].pages.append(page)
#save last PDF file
pdfPath = os.path.join(geoPath, str(newPdfIndex), file)
outputFilename = f"{pdfPath}-{newPdfIndex}.pdf"
newPdfFiles[newPdfIndex].save(os.path.join(geoPath, "1. Page", outputFilename))
print(f"Splitting up the {run}. PDF.")
#pdf.close()
#Pdf.merger.close()
#deleting temp folder
print(end = "\n")
print("#Removing temp folder...")
shutil.rmtree(tempPath)
time.sleep(2)
Question: I would like to know if there is a way to close the used PDF before deleting the folder
CodePudding user response:
So I tried your code and ran into the following issues.
- e.g.
file_path = temp_path.joinpath(file)
doesn't work somehow even though I usedfrom pathlib import Path
, so I got back tofile_path = os.path.join(tempPath, file)
. - The line
for pdf_index, pages_range in file2pages.items():
is addressing the number of elements infile2pages = {0: [0, 1], 1: [1, 2], 2: [2, 3], 3: [3, 4], 4: [4, 5]}
. That means the code only works for pdf with 5 pages.
In my case there are always 3 but I don't want to hard code it so I tried creating the file2pages
based on the number of pages in each document, like so: file2pages = {f"{i}: [{i},{i 1}]" for i in range(pdf_reader.numPages)}
the output for this is {'1: [1,2]', '0: [0,1]', '2: [2,3]'}
but somehow for pdf_index, pages_range in file2pages.items():
is not accepting it and giving me the following error
AttributeError: 'set' object has no attribute 'items'
The code looks like this atm:
import os
import shutil
import time
#from pathlib import Path
from pikepdf import Pdf
#assuming team and geo are present in your #project dir
#temp_path = Path(".").joinpath("temp")
#geo_path = Path(".").joinpath("geo")
# splitting up PDFs to singel Page PDFs
#https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs\n")
for root, directories, files in os.walk(tempPath):
for file in files:
file_path = os.path.join(tempPath, file)
print(f'processing file {file_path}')
# HERE
with Pdf.open(file_path) as pdf:
pdf_reader = PyPDF2.PdfFileReader(os.path.join(tempPath, file))
file2pages = {f"{i}: [{i},{i 1}]" for i in range(pdf_reader.numPages)}
print(file2pages)
for pdf_index, pages_range in file2pages.items():
new_pdf = Pdf.new()
[new_pdf.pages.append(pdf.pages[i]) for i in range(*pages_range)]
output_path = os.path.join(geoPath, str(pdf_index))
#output_path.mkdir(parents=True, exist_ok=True)
out_filename = os.path.join(output_path, (f'{file}-{pdf_index}.pdf'))
print(f'saving {pages_range[0] 1}-{pages_range[1]} pages as {out_filename}')
new_pdf.save(out_filename)
deleting temp folder
print()
print("#Removing temp folder...")
shutil.rmtree(tempPath)
time.sleep(2)
CodePudding user response:
Here's a clean code that does the same. Tested on my machine.
import os
import shutil
import time
from pathlib import Path
from pikepdf import Pdf
# assuming team and geo are present in your project dir
temp_path = Path(".").joinpath("temp")
geo_path = Path(".").joinpath("geo")
# splitting up PDFs to singel Page PDFs
# https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs\n")
for root, directories, files in os.walk(temp_path):
for file in files:
file_path = temp_path.joinpath(file)
print(f'processing file {file_path}')
# HERE
with Pdf.open(file_path) as pdf:
file2pages = {i: [i, i 1] for i in range(len(pdf.pages))}
for pdf_index, pages_range in file2pages.items():
new_pdf = Pdf.new()
[new_pdf.pages.append(pdf.pages[i]) for i in range(*pages_range)]
output_path = geo_path.joinpath(str(pdf_index))
output_path.mkdir(parents=True, exist_ok=True)
out_filename = output_path.joinpath(f'{file}-{pdf_index}.pdf')
print(f'saving {pages_range[0] 1}-{pages_range[1]} pages as {out_filename}')
new_pdf.save(out_filename)
# deleting temp folder
print()
print("#Removing temp folder...")
shutil.rmtree(temp_path)
time.sleep(2)
Your code seems to have numerous issues. Hence re-implemented it. Observe the usage of with
if you still get error, you might want to check the file permissions on your machine.
EDIT: Based on the requirement to convert each page of the pdf into a separate pdf, updated the code to work with pdfs having variable number of pages.