Home > database >  Python subprocess performance for multiple pipelined commands
Python subprocess performance for multiple pipelined commands

Time:06-17

I was writing a python code using subprocess module and I got stuck in this situation where I need to use pipes to pass a result of a commnad to another to obtain specific data I need.

However, this also can be achieved through pure Python code.

Ex)

from subprocess import Popen
cmd_result = Popen('ls -l ./ | awk -F " " \'{if ($5 > 10000) print $0}\'' | grep $USER', shell=True).communicate().split('\n')

Or

cmd_result = Popen('ls -l ./', shell=True).communicate().split('\n')
result_lst = []
for result in cmd_result:
    result_items = result.split()
    if int(result_item[4]) > 10000 and result_item[2] == "user_name":
        result_lst.append(result)

       

And I am wondering which method is better than the other in efficiency-wise. I found that the one with pure python code is slower than the one with pipelines, but not sure if that means using pipes is more efficient.

Thank you in advance.

CodePudding user response:

The absolutely best solution to this is to avoid using a subprocess at all.

import os

myuid = os.getuid()

for file in os.scandir("."):
    st = os.stat(file)
    if st.st_size > 10000 and st.st_uid == myuid:
        print(file)

In general, if you want to run and capture the output of a command, the simplest by far is subprocess.check_output; but really, don't parse ls output, and, of course, try to avoid superfluous subprocesses like useless greps if efficiency is important.

files = subprocess.check_output(
    """ls -l . | awk -v me="$USER" '$5 > 10000 && $2 == me { print $9 }'""",
    text=True, shell=True)

This has several other problems; $4 could contain spaces (it does, on my system) and $9 could contain just the beginning of the file name if it contains spaces.

If you need to run a process which could produce a lot of output concurrently and fetch its output as it arrives, not when the process has finished, the Stack Overflow subprocess tag info page has a couple of links to questions about how to do that; I am guessing it is not worth the effort for this simple task you are asking about, though it could be useful for more complex ones.

  • Related