run all lines with !python commands in notebook-CodePudding

I have a text file which contains lines that writes !python commands with 2 arguments. For example

!python example.py 12345.mp4 12345.pkl
!python example.py 111.mp4 111.pkl
!python example.py 123.mp4 123.pkl
!python example.py 44441.mp4 44441.pkl
!python example.py 333.mp4 333.pkl
...

The thing is I want to run all those lines in notebook environment (of Microsoft Azure ML Notebook, or Google Colab). When I copy paste only a few lines (around 500) allowed to be pasted in a notebook code block and I have tens of thousands of lines. Is there a way to achieve this?

I have thought of using for loops to reproduce the text file, but I can't run !python commands inside of python for loop as far as i know.

Edit: I also feel like I have to add these mp4 files are in the same folder with the python code and my text containing those lines. So I want to run example.py for all files in a single folder, and with the argument that changes its .mp4 extension to .pkl (because that acts as name of my output from the command). Maybe now a better solution which runs faster can be made. And my example.py file can be found here: https://github.com/open-mmlab/mmaction2/blob/90fc8440961987b7fe3ee99109e2c633c4e30158/tools/data/skeleton/ntu_pose_extraction.py

CodePudding user response：

while running thousands of python interpreters seems like a really bad design as each interpreter needs (a non-negligable amount of) time to start, you can just remove the explaimation mark and run it using os.system.

import os
with open("file.txt",'r') as f:
    for line in f:
        command = line.strip()[1:]  # remove ! and the \r\n
        os.system(command)

which will take a few months to finish if you are starting tens of thousands of python interpreters, you are much better off running all the code inside a single interpreter in multiple processes using multiprocessing if you know what the file does.

CodePudding user response：

What you are asking seems misdirected. Running the commands specifically in a notebook only makes sense if each command produces some output which you want to display in the notebook; and even then, if there are more than a few, you want to automate things.

Either way, a simple shell script will easily loop over all the files.

#!/bin/sh
for f in *.mp4; do
    python example.py "$f" "${f%.mp4}.pki"
done

If you really insist on running the above from a notebook, saving it in a file (say, allmp4) and running chmod x on that file will let you run it with ! at any time (simply ! ./allmp4).

(The above instructions are OS-dependent; if you are running your notebook on Windows, the commands will be different, and sometimes bewildering to the point where you probably want to remove Windows.)

Equivalently, anything you can put in a script can be run interactively; depending on the exact notebook, you might not have access to a full shell in ! commands, in which case you can get one with sh -c '... your commands ...'. In general, newlines can be replaced with semicolons in shell scripts, though there are a few contexts where newlines translate to just whitespace (like after then and do).

Quite similarly, you can run python -c '... your python code ...' though complex Python code is hard to serialize into a one-liner. But then, your notebook already runs Python, so you can just put your loop in a cell, and run that.

from pathlib import Path
import subprocess

for f in Path(".").glob("*.mp4"):
    subprocess.run(
        ["python", "example.py",
         str(f), str(f.with_suffix(".pkl"))],
        check=True, text=True)

... though running Python as a subprocess of itself is often inefficient and clumsy; if you can import example and run its main function directly, you have more control (in particular around error handling), and more opportunities to use multiprocessing or other facilities for parallel processing etc. If this requires you to refactor example.py somewhat, perhaps weigh reusability against immediate utility - if the task is a one-off, getting it done quickly might be more important than getting it done right.