Delete Directory which fails because of opend PDF file-CodePudding

Hey guys I'm new here,

Program explanation: I'm currently working on a python program that unzips folders which include PDFs to an "temp" folder. It then splits the pages of the PDFs to single page PDFs and sorts them into folders on another path ("/Georeferenzieren/0; /Georeferenzieren/1; ...) depending on the page number. For this specific part of the code I followed this guys tutorial.

Problem: That all works perfectly fine, but when I try to delete the temp folder, an error is displayed that the first file of the folder is still being used by another process.

(ger)

PermissionError: [WinError 32] Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird: 'e:\Intern\Programmieren\Python for Work\Testumgebung\temp\20220524_0109_V01_Auskunft_01_A3_H.pdf'

(en)

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'e:\Intern\Programmieren\Python for Work\Testumgebung\temp\20220524_0109_V01_Auskunft_01_A3_H.pdf'

What I tried:

Restarting PC to make sure no program uses the PDF. I think the error still occured because the PDF is opend by the python file.
adding pdf.close() (pls see code below)
adding Pdf.merger.close(), like this guy suggested, which didn't work since I'm not using the merger. (pls see code below)

Code:

#variables from the code befor this passage
run = 0
tempPath = "E:\Intern\Programmieren\Python for Work\Testumgebung\temp"
geoPath = "E:\Intern\Programmieren\Python for Work\Testumgebung\Georeferenzieren"

#splitting up PDFs to singel Page PDFs
#https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs")

file2pages = {
    0: [0,1], 1: [1,2], 2: [2,3], 3: [3,4], 4: [4,5],
}

for root, directories, file in os.walk(tempPath):
    for file in file:
        run = run   1
        filePath = os.path.join(tempPath, file)
        pdf = Pdf.open(filePath)
        newPdfFiles = [Pdf.new() for i in file2pages]
        newPdfIndex = 0

        for n, page in enumerate(pdf.pages):
            if n in list(range(*file2pages[newPdfIndex])):
                newPdfFiles[newPdfIndex].pages.append(page)
            else:
                # make a unique filename based on original file name plus the index
                pdfPath = os.path.join(geoPath, str(newPdfIndex), file)
                outputFilename = f"{pdfPath}-{newPdfIndex}.pdf"

                # save the PDF file
                newPdfFiles[newPdfIndex].save(outputFilename)

                newPdfIndex  = 1
                # add the `n` page to the `newPdfIndex` file
                newPdfFiles[newPdfIndex].pages.append(page)

        #save last PDF file
        pdfPath = os.path.join(geoPath, str(newPdfIndex), file)
        outputFilename = f"{pdfPath}-{newPdfIndex}.pdf"
        newPdfFiles[newPdfIndex].save(os.path.join(geoPath, "1. Page", outputFilename))
        print(f"Splitting up the {run}. PDF.")

        #pdf.close()
        #Pdf.merger.close()

#deleting temp folder
print(end = "\n")
print("#Removing temp folder...")
shutil.rmtree(tempPath)
time.sleep(2)

Question: I would like to know if there is a way to close the used PDF before deleting the folder

CodePudding user response：

So I tried your code and ran into the following issues.

e.g. file_path = temp_path.joinpath(file) doesn't work somehow even though I used from pathlib import Path, so I got back to file_path = os.path.join(tempPath, file).
The line for pdf_index, pages_range in file2pages.items(): is addressing the number of elements in file2pages = {0: [0, 1], 1: [1, 2], 2: [2, 3], 3: [3, 4], 4: [4, 5]}. That means the code only works for pdf with 5 pages.

In my case there are always 3 but I don't want to hard code it so I tried creating the file2pages based on the number of pages in each document, like so: file2pages = {f"{i}: [{i},{i 1}]" for i in range(pdf_reader.numPages)} the output for this is {'1: [1,2]', '0: [0,1]', '2: [2,3]'} but somehow for pdf_index, pages_range in file2pages.items(): is not accepting it and giving me the following error

AttributeError: 'set' object has no attribute 'items'

The code looks like this atm:

import os
import shutil
import time
#from pathlib import Path

from pikepdf import Pdf

#assuming team and geo are present in your #project dir
#temp_path = Path(".").joinpath("temp")
#geo_path = Path(".").joinpath("geo")

# splitting up PDFs to singel Page PDFs
#https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs\n")

for root, directories, files in os.walk(tempPath):
    for file in files:
        file_path = os.path.join(tempPath, file)
        print(f'processing file {file_path}')
        
        # HERE
        with Pdf.open(file_path) as pdf:
            pdf_reader = PyPDF2.PdfFileReader(os.path.join(tempPath, file))
            file2pages = {f"{i}: [{i},{i   1}]" for i in range(pdf_reader.numPages)}
            print(file2pages)

            for pdf_index, pages_range in file2pages.items():
                new_pdf = Pdf.new()
                [new_pdf.pages.append(pdf.pages[i]) for i in range(*pages_range)]
                output_path = os.path.join(geoPath, str(pdf_index))
                #output_path.mkdir(parents=True, exist_ok=True)
                out_filename = os.path.join(output_path, (f'{file}-{pdf_index}.pdf'))
                print(f'saving {pages_range[0] 1}-{pages_range[1]} pages as {out_filename}')
                new_pdf.save(out_filename)
                
deleting temp folder
print()
print("#Removing temp folder...")
shutil.rmtree(tempPath)
time.sleep(2)

CodePudding user response：

Here's a clean code that does the same. Tested on my machine.

import os
import shutil
import time
from pathlib import Path

from pikepdf import Pdf

# assuming team and geo are present in your project dir
temp_path = Path(".").joinpath("temp")
geo_path = Path(".").joinpath("geo")

# splitting up PDFs to singel Page PDFs
# https://www.thepythoncode.com/article/split-pdf-files-in-python
print("###Split up PDFs\n")

for root, directories, files in os.walk(temp_path):
    for file in files:
        file_path = temp_path.joinpath(file)
        print(f'processing file {file_path}')

        # HERE
        with Pdf.open(file_path) as pdf:
            file2pages = {i: [i, i 1] for i in range(len(pdf.pages))}
            for pdf_index, pages_range in file2pages.items():
                new_pdf = Pdf.new()
                [new_pdf.pages.append(pdf.pages[i]) for i in range(*pages_range)]
                output_path = geo_path.joinpath(str(pdf_index))
                output_path.mkdir(parents=True, exist_ok=True)
                out_filename = output_path.joinpath(f'{file}-{pdf_index}.pdf')
                print(f'saving {pages_range[0]   1}-{pages_range[1]} pages as {out_filename}')
                new_pdf.save(out_filename)

# deleting temp folder
print()
print("#Removing temp folder...")
shutil.rmtree(temp_path)
time.sleep(2)

Your code seems to have numerous issues. Hence re-implemented it. Observe the usage of with

if you still get error, you might want to check the file permissions on your machine.

EDIT: Based on the requirement to convert each page of the pdf into a separate pdf, updated the code to work with pdfs having variable number of pages.