Home > Software engineering >  Is there an efficient way to executing a program with similar names using python in the terminal?
Is there an efficient way to executing a program with similar names using python in the terminal?

Time:01-08

I'm trying to process PDFs using PyMuPDF and I'm running this python file called process_pdf.py in the terminal.

> import sys, fitz
> fname = sys.argv[1]  # get document filename
> doc = fitz.open(fname)  # open document
> out = open(fname   ".txt", "wb")  # open text output
> for page in doc:  # iterate the document pages
> text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
> out.write(text)  # write text of page
> out.close()

Then I would feed in a pdf in the terminal such as python process_pdf.py 1.pdf. This would then produce 1.txt (text version of 1.pdf). A question I have is that can I make a simple program in the terminal that can iterate python process_pdf.py document_name.pdf multiple times like how a for-loop works? This is because the file names are sequential numbers.

I thought about making a for-loop such as

> for i in range(1,101): 
>     python process_pdf.py i.pdf

But that isn't how python works. P.S. Sorry if this doesn't make any sense; I'm very new into coding :(

CodePudding user response:

Well, yes. you can execute any process with python, including python.exe (or /usr/bin/python3 if on linux) and give it any parameters you want.

subprocess.popen, os.system, etc.

There are some better ways mentioned here for specifically running python scripts from python. (runpy)

but... this feels like an xy problem.

how about simply generating the file names in the code?

import sys, fitz

for i in range(1,101): 
   fname = f"{i}.pdf"  # get document filename
   doc = fitz.open(fname)  # open document
   out = open(fname   ".txt", "wb")  # open text output
   for page in doc:  # iterate the document pages
       text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
       out.write(text)  # write text of page
   out.close()
    

also, im unfamiliar with "fitz" but maybe you need to close the "doc" file. check out the "with" statement.

CodePudding user response:

If you want to execute the for loop from the python shell and you don't want to use subprocess then rewrite the module and put the instructions in a function.

process_pdf.py

import sys, fitz

def func(fname):
    doc = fitz.open(fname)  # open document
    with open(fname   ".txt", "wb") as out:  # open text output
        for page in doc:  # iterate the document pages
            # get plain text (is in UTF-8)
            # write text of page

Import the function in the python shell and call it in the for loop.

>>> from process_pdf import func
>>> for i in range(1,101):
...     func('{}.pdf'.format(i))
...     # func(f'{i}.py')
... 

Or import the module and call the function using dot notation.

>>> import process_pdf
>>> for i in range(1,101):
...     process_pdf.func('{}.pdf'.format(i))
...     # process_pdf.func(f'{i}.py')
... 
  • Related