Home > Enterprise >  How to iterate through multiple pdf's to extract first page from each file using python?
How to iterate through multiple pdf's to extract first page from each file using python?

Time:10-05

I've written the below code to extract only the first page of the pdf and the code is working fine. please refer the below code:

from PyPDF2 import PdfFileReader, PdfFileWriter

path = "C:\\Users\\abc\Data\first.pdf" 
pdf = PdfFileReader(path)
file_ext = path.replace('.pdf','')
pdfpage = [0]
PdfFileWriter = PdfFileWriter()

for page_num in pdfpage:
    PdfFileWriter.addPage(pdf.getPage(page_num))
with open('{0}_output.pdf'.format(file_ext),'wb') as a:
    PdfFileWriter.write(a)
    a.close()

The above code works fine and my question is how to extract the first page of multiple Pdf's?

For example I'm having pdf's like [first.pdf, second.pdf, third.pdf and so on] residing inside the same folder. Please pour some suggestions

CodePudding user response:

you can wrap your code into function and call it when you need it

from PyPDF2 import PdfFileReader, PdfFileWriter

def function_for_one_pdf(name):    
    path = f"C:\\Users\\abc\Data\{name}.pdf" 
    pdf = PdfFileReader(path)
    file_ext = path.replace('.pdf','')
    pdfpage = [0]
    PdfFileWriter = PdfFileWriter()

    for page_num in pdfpage:
        PdfFileWriter.addPage(pdf.getPage(page_num))
    with open('{0}_output.pdf'.format(file_ext),'wb') as a:
        PdfFileWriter.write(a)
        a.close()


list_of_pdfs= ['first.pdf', 'second.pdf', 'third.pdf']         

for i in list_of_pdfs:
    function_for_one_pdf(i)
  • Related