Home > Mobile >  subprocess and exchanging json: How can I use read() on stdin non-blockingly?
subprocess and exchanging json: How can I use read() on stdin non-blockingly?

Time:06-26

I have a main process, that sends json structured data to a subprocess. The subprocess is working with this data and giving back information on the progress in a percentage to the main process (which shall update a progress bar in the user interface).

The problem is, that the output of the subprocess is only received by the main process when the subprocess is already finished. It blocks on the read()-Statement I suppose. How can I get the main process to work with the response, as soon as child process posts a line to its stdout?

Here's the minimal working example:

parent.py

from json import dumps
import subprocess
from time import sleep

lines_to_exchange = ["this is line one", "this is line two", "this is line three", "this is line four", "this is line five"]
command = ["python", "-u", "./child.py"]
print("start process")


sub = subprocess.Popen(command, text=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)

sub.stdin.write(dumps(lines_to_exchange))
sub.stdin.close()


while True:
    sleep(0.1)
    stdout = sub.stdout.read()
    print(stdout)

    if sub.poll() is not None:
        print("process completed")
        break

child.py

from time import sleep
from json import loads
import sys
lines = loads(input())

for line in lines:
    sleep(1)
    print(line)
    sys.stdout.flush()

I am working on Windows with python 3.10 and pycharm IDE.

CodePudding user response:

tl;dr: Use os.set_blocking(fd, False) for non-blocking reads.


What a great question! Kudos for offering an MRE in a nice educational way.


I really like the {while True, sleep epsilon} safety measure. Always a good practice to unconditionally sleep a moment, so buggy code won't accidentally peg a core.


nit: The explicit .flush() is very nice. Some folks like to make it part of the print: print(line, flush=True)

The -u unbuffered flag is redundant with the explicit flush. But sure, I get it, you were throwing everything at it in hopes of making it work, cool.

In any event, the child is behaving perfectly.


The parent is almost correct, you're very close. What's tripping you up is that child produces several small messages totaling less than a hundred bytes, and parent is defaulting to a large buffer.

To see this, change .read() to e.g. .read(10).

Here is one default buffer size setting:

>>> import select
>>> select.PIPE_BUF
512

What you really want is non-blocking I/O in the parent. Here is some setup that elsewhere I have tested as working:

    with Popen(cmd, stdout=PIPE) as sub:
        stdout = io.TextIOWrapper(sub.stdout)
        fd = stdout.fileno()
        os.set_blocking(fd, False)

Feel free to discard the complexity of a line-oriented wrapper if you don't need that. The critical item is to set non-blocking, as that alters the behavior of .read().

We have an opportunity to replace the "polling" sleep() with a call to select(), which will pause for exactly the right amount of time and wake up immediately upon data being ready.

            select([fd], [], [], timeout.total_seconds())

That last safety-valve parameter can be any value suitably large, say 30 seconds if you believe child will always have something to say within that interval.

Now when you .read(), or perhaps .read(PIPE_BUF) with explicit buffer size, it will complete immediately since it is non-blocking. This means it can return zero bytes, and often will return fewer bytes than you asked for. That's ok, just process the bytes and go back to looping if child is still alive.


Note that the parent code you posted has a race condition; it may not read everything the child said. Do some final reads after child death to ensure that nothing is lost. Parent's responsibility is to keep reading until EOF on that file descriptor.

  • Related