I am looking for a help with "any document converter", where any document file [doc, docx, ppt, pptx] will be converted to pdf. DOCX and PPTX are easy to handle with python libraries, but DOC and PPT is a bit tricky.
The answers I've got 7 month ago was quite a bit hard to deal with. Especially the one with use of Unoconv (now its deprecated and changed to Unoserv).
Initial code example:
import os
import shutil
src = ".../srcpaths"
dst = ".../dstpaths"
ext = ['ppt', 'pptx', 'doc', 'docx']
for root, subfolders, filenames in os.walk(src):
for filename in filenames:
if os.path.splitext(filename)[1] in ext:
shutil.copy2(os.path.join(root, filename), os.path.join(dst, filename))
def ConvertToPDF(ext):
#some code#
ConvertToPDF('.ppt')
ConvertToPDF('.pptx')
ConvertToPDF('.doc')
ConvertToPDF('.docx')
CodePudding user response:
Below is my review of solutions and an answer at the end:
1. Pandoc:
- requires pdf latex processor
- not preserving the shape of files well
- loss of formatting
- problems with graphics
- problems with charts
- problems with fonts
- low on formats choice
2. Unoconv/Unoserver
- hard to install and deal with
- requires Libre Office as engine
- good conversion results (not perfect)
3. Cloud-based solutions:
- not free
- not open-source friendly
- privacy concerns
4. Google Drive API converter:
- using someone’s account
- upload document – Convert it – Save it as PDF
- privacy concerns
5. LibreLambda
- uses Amazon Web Services (AWS)
- privacy concerns
Simple solution:
Use the software straightly by running it in a cmd subprocess.
Needs: installation of LibreOffice. Biggest advantage: can run both on Windows and Linux (should be modified for linux)
Here is my Python code for Windows:
import os
import subprocess
# path to the engine
path_to_office = r"C:\Program Files\LibreOffice\program\soffice.exe"
# path with files to convert
source_folder = r"C:\ConvertToPDF\input_files"
# path with pdf files
output_folder = r"C:\ConvertToPDF\output_files"
# changing directory to source
os.chdir(source_folder)
# assign and running the command of converting files through LibreOffice
command = f"\"{path_to_office}\" --convert-to pdf --outdir \"{output_folder}\" *.*"
subprocess.run(command)
print('Converted')
If you can modify it to Linux, please feel free to share your solution