I am performing data validation on files that I download from a url. One of those validation checks involves checking the number of pages of a PDF. Using PyPDF2 package and PdfFileReader module, this worked until I encountered a PDF with 256-bit AES encryption that has a permissions password but no document open password. I have no access to any passwords since these files are from manufacturer websites so I concluded that for now I can just check to see if the PDF is encrypted, and if it is, skip it for now, but regardless if I try to retrieve the page count or check if the PDF is encrypted, I get this error:
DependencyError: PyCryptodome is required for AES algorithm
This error occurs at line 6, the if statement.
This is despite having pycryptodome installed and the AES module imported. Also, I am using Jupyter Notebook. Here is my code:
! pip install PyPDF2
! pip install pycryptodome
from PyPDF2 import PdfFileReader
from Crypto.Cipher import AES
if PdfFileReader('Media Downloaded Files/spk-10-3144 bro.pdf').isEncrypted:
print('This file is encrypted.')
else:
print(PdfFileReader('Media Downloaded Files/spk-10-3144-bro.pdf').numPages)
Solution:
! pip install pikepdf
from pikepdf import Pdf
pdf = Pdf.open('Media Downloaded Files/spk-10-3144-bro.pdf')
len(pdf.pages)
CodePudding user response:
I had a problem using PyPDF3 (it's a fork from PyPDF2) involving encryptation. I solved replacing it for pikepdf. It has more encryption algorithms implementations. Try it out!