I am creating a program that will rename a series of PDFs within a specific directory based on their contents. I've got the contents extracted into a string, but os.rename() is not able to change the name because the file is open already. I found a nearly identical solution, but I've not been able to implement it properly. I feel like I'm pretty close to perfect functionality, but I don't know where to put load_pdf.close(), or if I need to phrase it differently. Wherever I put it either throws the same error or some other error that would lead to certain failure.
import PyPDF2
import os
for file_name in os.listdir('upload_12.5.22_test'):
load_pdf = open('C:/Users/Jake/Documents/upload_12.5.22_test/' file_name,'rb')
read_pdf = PyPDF2.PdfFileReader(load_pdf)
page_count = read_pdf.getNumPages()
data_page = read_pdf.getPage(0)
page_content = data_page.extractText()
page_content = page_content.replace('\n','')
page_content = page_content.split('reports.')
del page_content[0:1]
p_c_str = ''.join(page_content)
p_c_str = p_c_str.strip()
p_c_str = p_c_str[:-6]
p_c_str = p_c_str " agreement"
load_pdf.close()
os.rename('C:/Users/Jake/Documents/upload_12.5.22_test/' file_name, 'C:/Users/Jake/Documents/upload_12.5.22_test/' p_c_str ".pdf")
ERROR:
Traceback (most recent call last):
File "C:\Users\Jake\Documents\progam1.py", line 30, in <module>
os.rename('C:/Users/Jake/Documents/upload_12.5.22_test/' file_name, 'C:/Users/Jake/Documents/upload_12.5.22_test/' p_c_str ".pdf")
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:/Users/Jake/Documents/upload_12.5.22_test/First M Last agreement.pdf' -> 'C:/Users/Jake/Documents/upload_12.5.22_test/First Last agreement.pdf'
[Finished in 456ms]
UPDATE: I have found this solution, which seems to be addressing the issue. So, I am calling os.rename() improperly, and trying to change the active file directory. So, I've moved everything into a single directory to eliminate any errors from linking to the direct source, but It's still throwing the exact same error, so I'm back to where I started.
import PyPDF2
import os
for file_name in os.listdir():
load_pdf = open(file_name,'rb')
read_pdf = PyPDF2.PdfFileReader(load_pdf)
page_count = read_pdf.getNumPages()
data_page = read_pdf.getPage(0)
page_content = data_page.extractText()
page_content = page_content.replace('\n','')
page_content = page_content.split('reports.')
del page_content[0:1]
p_c_str = ''.join(page_content)
p_c_str = p_c_str.strip()
p_c_str = p_c_str[:-6]
p_c_str = p_c_str " agreement"
load_pdf.close()
os.rename(file_name, p_c_str ".pdf")
ERROR:
Traceback (most recent call last):
File "C:\Users\Jake\Documents\Work Projects\Python\Contract Extraction\upload_12.5.22_test\contract_extraction_testing_2.py", line 28, in <module>
os.rename(file_name, p_c_str ".pdf")
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'Alan R Diaz agreement.pdf' -> 'Alan Diaz agreement.pdf'
[Finished in 205ms]
CodePudding user response:
My suggestion would be to use a context manager, which will close the file as soon as you're done reading it:
with open('C:/Users/Jake/Documents/upload_12.5.22_test/' file_name,'rb') as load_pdf:
read_pdf = PyPDF2.PdfFileReader(load_pdf)
However, your code should have been correct as shown, since you close the file before you try to rename it. As others have said in the comments, if the file is opened in any other process, Windows won't let you rename it, so the most likely solution is closing other apps or maybe even a reboot.