Home > other >  Python only read the first PDF FIle
Python only read the first PDF FIle

Time:08-21

I am using following code for the test purpose. However it only reads the first pdf of the directory. My pdf files name is like test , test1 and test2. But it only reads the test. I can also see the list of all pdf in the directory but the read function only reads the first pdf, I am not sure my line of code for file_name, it should be like that or something else.

I request for help please. Below is my code for reference

#date
from datetime import*
import PyPDF2
import os
import re
today_date = datetime.today()
print('Today is:' , today_date)
#file list
for file_name in os.listdir(r"C:\\Nikhar\Work\Quantum\Work"):
    print(file_name)
#read all file in directory  
load_pdf = open(r"C:\\Nikhar\\Work\\Quantum\\Work\\" file_name, "rb")
read_pdf = PyPDF2.PdfFileReader(load_pdf)
page_count = read_pdf.getNumPages()
first_page = read_pdf.getPage(0)
page_content = first_page.extractText()
page_content = page_content.replace('\n', '')
print(page_content)

CodePudding user response:

You must simply indent the code that should be executed in the for loop:

#date
from datetime import*
import PyPDF2
import os
import re
today_date = datetime.today()
print('Today is:' , today_date)
#file list
for file_name in os.listdir(r"C:\\Nikhar\Work\Quantum\Work"):
    print(file_name)
    #read all file in directory  
    load_pdf = open(r"C:\\Nikhar\\Work\\Quantum\\Work\\" file_name, "rb")
    read_pdf = PyPDF2.PdfFileReader(load_pdf)
    page_count = read_pdf.getNumPages()
    first_page = read_pdf.getPage(0)
    page_content = first_page.extractText()
    page_content = page_content.replace('\n', '')
    print(page_content)

CodePudding user response:

You have to indent the code to make it executed in each loop like this :

#date
from datetime import*
import PyPDF2
import os
import re
today_date = datetime.today()
print('Today is:' , today_date)
#file list
for file_name in os.listdir(r"C:\\Nikhar\Work\Quantum\Work"):
    print(file_name)
    #read all file in directory  
    load_pdf = open(r"C:\\Nikhar\\Work\\Quantum\\Work\\" file_name, "rb")
    read_pdf = PyPDF2.PdfFileReader(load_pdf)
    page_count = read_pdf.getNumPages()
    first_page = read_pdf.getPage(0)
    page_content = first_page.extractText()
    page_content = page_content.replace('\n', '')
    print(page_content)
  • Related