Home > database >  Multiprocess inherently shared memory in no longer working on python 3.10 (coming from 3.6)
Multiprocess inherently shared memory in no longer working on python 3.10 (coming from 3.6)

Time:01-03

I understand there are a variety of techniques for sharing memory and data structures between processes in python. This question is specifically about this inherently shared memory in python scripts that existed in python 3.6 but seems to no longer exist in 3.10. Does anyone know why and if it's possible to bring this back in 3.10? Or what this change that I'm observing is? I've upgraded my Mac to Monterey and it no longer supports python 3.6, so I'm forced to upgrade to either 3.9 or 3.10 .

Note: I tend to develop on Mac and run production on Ubuntu. Not sure if that factors in here. Historically with 3.6, everything behaved the same regardless of OS.

Make a simple project with the following python files

myLibrary.py

MyDict = {}

test.py

import threading
import time
import multiprocessing

import myLibrary


def InitMyDict():
    myLibrary.MyDict = {'woot': 1, 'sauce': 2}
    print('initialized myLibrary.MyDict to ', myLibrary.MyDict)


def MainLoop():
    numOfSubProcessesToStart = 3
    for i in range(numOfSubProcessesToStart):
        t = threading.Thread(
            target=CoolFeature(),
            args=())
        t.start()

    while True:
        time.sleep(1)


def CoolFeature():
    MyProcess = multiprocessing.Process(
        target=SubProcessFunction,
        args=())
    MyProcess.start()


def SubProcessFunction():
    print('SubProcessFunction: ', myLibrary.MyDict)


if __name__ == '__main__':
    InitMyDict()
    MainLoop()

When I run this on 3.6 it has a significantly different behavior than 3.10. I do understand that a subprocess cannot modify the memory of the main process, but it is still super convenient to access the main process' data structure that was previously set up as opposed to moving every little tiny thing into shared memory just to read a simple dictionary/int/string/etc.

Python 3.10 output:

python3.10 test.py 
initialized myLibrary.MyDict to  {'woot': 1, 'sauce': 2}
SubProcessFunction:  {}
SubProcessFunction:  {}
SubProcessFunction:  {}

Python 3.6 output:

python3.6 test.py 
initialized myLibrary.MyDict to  {'woot': 1, 'sauce': 2}
SubProcessFunction:  {'woot': 1, 'sauce': 2}
SubProcessFunction:  {'woot': 1, 'sauce': 2}
SubProcessFunction:  {'woot': 1, 'sauce': 2}

Observation:

Notice that in 3.6, the subprocess can view the value that was set from the main process. But in 3.10, the subprocess sees an empty dictionary.

CodePudding user response:

In short, since 3.8, CPython uses the spawn start method on MacOs. Before it used the fork method.

On UNIX platforms, the fork start method is used which means that every new multiprocessing process is an exact copy of the parent at the time of the fork.

The spawn method means that it starts a new Python interpreter for each new multiprocessing process. According to the documentation:

The child process will only inherit those resources necessary to run the process object’s run() method.

It will import your program into this new interpreter, so starting processes et cetera sould only be done from within the if __name__ == '__main__':-block!

This means you cannot count on variables from the parent process being available in the children, unless they are module level constants which would be imported.

So the change is significant.

What can be done?

You could have the parent write the information to be shared to a file, e.g. in JSON format before it starts other processes. Then the children could simply read this. That is probably the simplest solution.

Using a multiprocessing.Manager would allow you to share a dict between processes. There is however a certain amount of overhead associated with this.

Or you could try calling multiprocessing.set_start_method("fork") before creating processes or pools and see if it doesn't crash in your case. That would revert to the pre-3.8 method on MacOs. But as documented in this bug, there are real problems with using the fork method on MacOs. Reading the issue indicates that fork might be OK as long as you don't use threads.

  • Related