Home > Software design >  Python Subprocess Loop runs Twice
Python Subprocess Loop runs Twice

Time:12-31

So, I created a Python script to batch convert PDF files using Ghostscript. Ideally it should work, but I am not sure why it isn't working. For now, it is going through the input PDF files twice and when it runs the second time, it overwrites the output files.

Here's the script.

from __future__ import print_function
import os
import subprocess

try:
   os.mkdir('compressed')
except FileExistsError:
   pass   

for root, dirs, files in os.walk("."):
   for file in files:
      if file.endswith(".pdf"):
         filename = os.path.join(root, file)
         arg1= '-sOutputFile='   './compressed/'   file
         print ("compressing:", file )
         p = subprocess.Popen(['gs', '-sDEVICE=pdfwrite', '-dCompatibilityLevel=1.4', '-dPDFSETTINGS=/screen', '-dNOPAUSE', '-dBATCH',  '-dQUIET', str(arg1), filename], stdout=subprocess.PIPE).wait()

Here's the ouput.

enter image description here

I am missing what did I do wrong.

CodePudding user response:

file is just the name of the file. You have several files called the same in different directories. Don't forget that os.walk recurses in subdirectories by default.

So you have to save the converted files in a directory or name which depends on root.

and put the output directory outside the current directory as os.walk will scan it

For instance, for flat output replace:

arg1= '-sOutputFile='   './compressed/'   file

by

arg1= '-sOutputFile='   '/somewhere/else/compressed/'   root.strip(".").replace(os.sep,"_") "_" file

The expression

root.strip(".").replace(os.sep,"_")

should create a "flat" version of root tree without current directory (no dot) and path separators converted to underscores, plus one final underscore. That's one option that would work.

An alternate version that won't scan ./compressed or any other subdirectory (maybe more what you're looking for) would be using os.listdir instead (no recursion)

root = "."
for file in os.listdir(root):
  if file.endswith(".pdf"):
     filename = os.path.join(root, file)
     arg1= '-sOutputFile='   './compressed/'   file
     print ("compressing:", file )

Or os.scandir

root = "."
for entry in os.scandir(root):
  file = entry.name
  if file.endswith(".pdf"):
     filename = os.path.join(root, file)
     arg1= '-sOutputFile='   './compressed/'   file
     print ("compressing:", file )

CodePudding user response:

Your problem is that os.walk will also retrieve that contents in "compressed" directory. This is because the files will be compressed and created before os.walk list files in that directory. If you add print(os.path.join(root, file)) to your for-loop you will notice that.

Bellow is a snippet that works since the files retrieved are only the ones in the current directory.

import os

os.makedirs("compressed", exist_ok=True)

for file in os.listdir("."):
    if not os.path.isfile(file):
        continue
    if not file.endswith(".pdf"):
        continue
    print(file)

CodePudding user response:

os.walk will by definition enter into subdirectories, so you are compressing the files in the compressed subdirectory a second time.

Probably you simply want

for file in os.scandir("."):
   ...

As an aside, you almost certainly want to avoid Popen in favor of subprocess.run() or one of its legacy variations.

CodePudding user response:

On the first iteration of for root, dirs, files in os.walk(".") you find the files in the current directory, then you compress them into the ./compressed/*.pdf path.

After that the second iteration of the outer loop will find the already compressed files in the subdirectory.

Easiest fix is to move the output directory outside of the input directory (or create an input directory next to the compressed dir, and read the files from there instead of .)

  • Related