Home > front end >  Remove duplicate pages from a PDF
Remove duplicate pages from a PDF

Time:01-24

I have a pdf file which has lots duplicate pages which I want to remove. This is my code:

pdf_reader = PyPDF2.PdfFileReader(filename_path)
print(pdf_reader.getNumPages())
pdf_writer = PyPDF2.PdfFileWriter()
last_page_n = pdf_reader.getNumPages() - 1

megalist1 =[]
for i in range(last_page_n):
    current_page = pdf_reader.getPage(i)
    megalist1.append(current_page)

res = []
[res.append(x) for x in megalist1 if x not in res]
print(len(megalist1))

It doesn't generate any error but it doesn't work either. What is that I am doing wrong?

CodePudding user response:

That's not how list comprehensions work, but you could have performed the duplicate check when adding to your original list, i.e:

megalist1 =[]
for i in range(last_page_n):
    current_page = pdf_reader.getPage(i)
    if current_page not in megalist:
        megalist1.append(current_page)

CodePudding user response:

Here's one way to fix your code:

pdf_reader = PyPDF2.PdfFileReader(filename_path)
pdf_writer = PyPDF2.PdfFileWriter()

# Create an empty list to store unique pages
unique_pages = []

# Iterate through each page in the PDF
for i in range(pdf_reader.getNumPages()):
    current_page = pdf_reader.getPage(i)
    # Check if the current page is already in the unique_pages list
    if current_page not in unique_pages:
        # If not, add it to the list
        unique_pages.append(current_page)
        # And also add it to the output PDF
        pdf_writer.addPage(current_page)

# Write the output PDF to a new file
with open("output.pdf", "wb") as out:
    pdf_writer.write(out)
  • Related