So I've this code which worked file for the individual pdf file. but is not working when i added loop. as I wanted to loop through multiple pdf files in folder --> subfolders -->pdf files. In path I'm not giving subfolders.
import fitz
import os
path = "/users/folder"
for i in os.listdir(path):
if i.endswith(".pdf"):
with fitz.open(path) as doc:
text = ""
for page in doc:
text = page.getText().strip()
return text
CodePudding user response:
You are trying to open "path" variable as a file. Try this:
import fitz
import os
path = "/users/folder"
for i in os.listdir(path):
if i.endswith(".pdf"):
with fitz.open(path "/" i) as doc:
text = ""
for page in doc:
text = page.getText().strip()
return text
CodePudding user response:
Get a list of pdf files to process.
The following code has been tested for syntax. There are no errors.
However, it has not been tested for logical errors due to a lack of test files.
import fitz
import os
pdf_files = []
path = "/users/folder"
for root, dirs, files in os.walk(path):
# root will initially = path
# Next loop root will become the next subdirectory which is found
for directory in dirs:
for file in files:
# All files in the current root will be checked
if file.endswith(".pdf"):
pdf_files.append(os.path.join(directory, file)) # absolute path
for file in pdf_files:
with fitz.open(path) as doc:
text = ""
for page in doc:
text = page.getText().strip()