Home > other >  Python traversal folder PDF and translated into images
Python traversal folder PDF and translated into images

Time:10-31

Refer to various great god, and got a
# - * - coding: utf-8 - * -

The import sys, fitz
The import OS
Import a datetime
The import re

Def get_file_list (dir, file_type_list=[' PDF ', 'TXT', 'CSV', 'XLSX', 'like'], file_list=[]) :
"' access to specified folder type specified file path
: param dir: folder path
The file type is: param file_type_list:
List: param file_list: "'
For root, _, files in OS. Walk (dir) :
For file in files:
File_type=file [file rfind ('. ') + 1:]
If file_type file_type_list in:
File_list. Append (OS) path) join (root, file))
Return file_list

Def get_file_name (path_string) :
"" "access to the file name, do not contain the suffix "" "
The pattern=re.com running (r '([^ & lt; & gt;/\ \ \ | : "" \ * \?] +) \. \ w + $')
Data=https://bbs.csdn.net/topics/pattern.findall (path_string)
If data:
The return data [0]

Def pyMuPDF_fitz (file_dir_path, out_file_path file_type_list=[' PDF ', 'TXT', 'CSV', 'XLSX', 'like']) :
StartTime_pdf2img=datetime. Datetime. Now () # start time

If not OS. Path. The exists (out_file_path) : # whether output folder there
OS. Makedirs created (out_file_path) # if there is no
Print (" imagePath="+ out_file_path)
File_paths=get_file_list (file_dir_path file_type_list) # access file list
N=0
T=0
For file in file_paths:
N=n + 1
PdfDoc=fitz. Open (file)
Print (" the convert File="+ File)
M=0
For pg in range (pdfDoc pageCount) :
Page=pdfDoc (pg)
Rotate=int (0)
# each size scaling coefficient is 1.3, which will be increased by 2.6 as we generate resolution images,
# if you are not here to do is set, the default image size is: 792 x612, dpi=72
Zoom_x=2 # 1.33333333 # (1.33333333 - & gt; 1056 x816) (2 - & gt; 1584 x1224)
Zoom_y=2 # 1.33333333
Mat=fitz. Matrix (zoom_x, zoom_y). PreRotate (rotate)
Pix=page. GetPixmap (matrix=mat, alpha=False)
Pix. WritePNG (out_file_path + '/' + get_file_name (file) + 'supachai panitchpakdi ng _images_ % % pg) # to write image into the specified folder
M=m + 1
T=t + 1
Print (" the convert File="+ File, m, 'pages finished.')
EndTime_pdf2img=datetime. Datetime. Time off for now () #
Print ('='pdf2img time, (endTime_pdf2img - startTime_pdf2img) seconds,' seconds, a total of ', n 'a file conversion has been finished, t,', ')

If __name__=="__main__" :
PdfPath='./forConvert '# relative path
ImagePath='/path/image'
PyMuPDF_fitz (pdfPath, imagePath, [' PDF '])

  • Related