Home > Back-end >  Looping through folder to read pdf files
Looping through folder to read pdf files

Time:03-09

So I've this code which worked file for the individual pdf file. but is not working when i added loop. as I wanted to loop through multiple pdf files in folder --> subfolders -->pdf files. In path I'm not giving subfolders.

import fitz
import os
path = "/users/folder"

for i in os.listdir(path):
    if i.endswith(".pdf"):
       with fitz.open(path) as doc:
          text = ""
          for page in doc:
              text  = page.getText().strip()
          return text

CodePudding user response:

You are trying to open "path" variable as a file. Try this:

import fitz
import os
path = "/users/folder"

for i in os.listdir(path):
    if i.endswith(".pdf"):
       with fitz.open(path   "/"   i) as doc:
          text = ""
          for page in doc:
              text  = page.getText().strip()
          return text

CodePudding user response:

Get a list of pdf files to process.

The following code has been tested for syntax. There are no errors.

However, it has not been tested for logical errors due to a lack of test files.

import fitz
import os

pdf_files = []
path = "/users/folder"
for root, dirs, files in os.walk(path):
    # root will initially = path
    # Next loop root will become the next subdirectory which is found
    for directory in dirs:
        for file in files:
            # All files in the current root will be checked
            if file.endswith(".pdf"):
                pdf_files.append(os.path.join(directory, file))  # absolute path

for file in pdf_files:
    with fitz.open(path) as doc:
        text = ""
        for page in doc:
            text  = page.getText().strip()
  • Related