I have a pdf file which has lots duplicate pages which I want to remove. This is my code:
pdf_reader = PyPDF2.PdfFileReader(filename_path)
print(pdf_reader.getNumPages())
pdf_writer = PyPDF2.PdfFileWriter()
last_page_n = pdf_reader.getNumPages() - 1
megalist1 =[]
for i in range(last_page_n):
current_page = pdf_reader.getPage(i)
megalist1.append(current_page)
res = []
[res.append(x) for x in megalist1 if x not in res]
print(len(megalist1))
It doesn't generate any error but it doesn't work either. What is that I am doing wrong?
CodePudding user response:
That's not how list comprehensions work, but you could have performed the duplicate check when adding to your original list, i.e:
megalist1 =[]
for i in range(last_page_n):
current_page = pdf_reader.getPage(i)
if current_page not in megalist:
megalist1.append(current_page)
CodePudding user response:
Here's one way to fix your code:
pdf_reader = PyPDF2.PdfFileReader(filename_path)
pdf_writer = PyPDF2.PdfFileWriter()
# Create an empty list to store unique pages
unique_pages = []
# Iterate through each page in the PDF
for i in range(pdf_reader.getNumPages()):
current_page = pdf_reader.getPage(i)
# Check if the current page is already in the unique_pages list
if current_page not in unique_pages:
# If not, add it to the list
unique_pages.append(current_page)
# And also add it to the output PDF
pdf_writer.addPage(current_page)
# Write the output PDF to a new file
with open("output.pdf", "wb") as out:
pdf_writer.write(out)