I have some data (bytes) represending a png image that come over a pyzmq socket from an executable running in the background. I transform these to a numpy array using the following command
decoded = np.asarray(im.open(io.BytesIO(data)))
When I do this from the command line this line of code takes under a millisecond to run (as also verified with python's profiler). So far so good.
Now I need to run this same line of code from my app called Heron. This is a bit tricky to explain but practically Heron is a graphical framework that spawns multiple processes each connected to the others through 0MQ sockets for data transfer. The user puts together code for each process (follwoing Heron's API) and Heron will create a graphical node for it and be responsible for spinning up and closing down a subprocess to run this code in as well as receiving and sending messages between this subporcess and all the others Heron has spawned. For all intents and purposes the line of code I mentioned above just runs in its own process spun up by Heron (instead of being called from the command line). Everything else is identical as far as I can see.
When I run this line from inside a Heron subprocess then it takes about 90ms to run (again as verified by the python profiler). The same commands (like PIL's PngImagePlugin.py:198(call)) will now take 16ms instead of way under a ms in the previous case).
I tried to also create a subprocess from the command line that runs the same line but unhelpfully it run as fast as if run straight from command line (under 1ms). So it seems the problem is not directly related to the code running in a subprocess but somehow the way it runs in the subprocess specificaly created by Heron.
Heron's main process (which spawns all others) is a dearpygui process (so it shows a gui).
I understand that the question is very vague especially since I am the one who wrote Heron (so I should now better how exactly it behaves), but for the life of me I cannot see why code running inside a subprocess started by a gui based process (Heron) would behave any differently.
Heron spins up the subprocess as follows:
kwargs = {'start_new_session': True} if os.name == 'posix' else\ {'creationflags':subprocess.CREATE_NEW_PROCESS_GROUP} pid = subprocess.Popen(new_arguments_list, **kwargs).pid
All of the above are happening under Windows 10 with 3.9 python. I haven't tried Linux yet.
I am posting this hoping that some of you may have come across the same problem of different speeds of execution in python code under different circumstances and might give me some pointers as to where to look for solutions.
Expectation and what did I do:
I was expecting the same speed of execution for the same line of code.
I tried running the code in its own subprocess but that didn't produce a difference so it must be something more unique to what Heron is doing.
I have also disabled the garbage collector but it made no difference.
CodePudding user response:
After @Jérôme's initial comment I tried to pin the process that run the problematic lines of code to a single cpu core (I have 8 in my machine).
I did this with the following code:
affinity = [int(affinity)]
proc = psutil.Process() # get self pid
aff_before = proc.cpu_affinity()
proc.cpu_affinity(affinity)
aff_after = proc.cpu_affinity()
print('Setting CPU affinity of Com process with PID: {} (and its Worker process) from {} to {}'.format(proc.pid, aff_before, aff_after))
This immediately solved the problem and now the time it takes the lines of code to excute is identical as when they are called from the command line.
Heron spawns a large number of processes, 2 * the number of Nodes its computational graph has and usually this can be anything between 3 and 10 Nodes, so 6 to 20 processes at any point in time.
I have got no clue though why this would cause some lines of code to slow down by two orders of magnitude. It would be interesting to now (but a problem for another time). In any case, problem solved (and now I have added to Heron the ability to pin processes to cores :) ).