Access and modify a 2D array using python multiprocessing-CodePudding

I'm just starting to work with python multiprocessing. The problem that I'm facing with my code can be summarized in the following way.

Basically, I am trying to have a 2D array being accessed and modified by several different processes.

I found a similar question here on StackOverflow from a year ago by Alam (How to create a shared 2D array in python multiprocessing) but the answer to the issue was not clearly given.

Here is a code snippet, similar to the one presented by Alam, to help the discussion.

import multiprocessing as mp
import numpy as np
import ctypes as c

n = 2
m = 3

def addData(array, process_number):
    n,m = np.shape(array)
    i=0
    
    for nn in range(n):
        for mm in range(m):
            array[nn][mm]  = i
            i=i 1
            
    print("Array after process "   str(process_number))
    print(array)

if __name__=='__main__':

    mp_arr=mp.Array('i', n*m)
    arr = np.frombuffer(mp_arr.get_obj(),c.c_int)

    u = arr.reshape((n, m))
    
    print("Array at the inital state: ")
    print(u)

    p1=mp.Process(target=addData,args=(u,1))
    p2=mp.Process(target=addData,args=(u,2))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    
    print("Array at the final state: ")
    print(u)

Here is the output: output_result

As you can see, the final output of the 2D array is still [[0,0,0], [0,0,0]] and I would like it to be [[0,2,4], [6, 8, 10]].

CodePudding user response：

When I ran this under Windows with Python 3.8.5 I got the same results as you, but under Linux and Python 3.9.7, it worked. Anyway, the following is the technique that seems to work everywhere.

We start off with a numpy array, which seems more natural, and from that create a shared memory array and then recreate the numpy array using the shared memory as its buffer. The other difference is that we are passing to the processes references to the shared memory and the shape of the numpy array and have it recreate the numpy array using helper function to_numpy_array:

import multiprocessing as mp
import numpy as np
import ctypes as c

n = 2
m = 3

def addData(shared_array, shape, lock, process_number):
    array = to_numpy_array(shared_array, shape)
    n,m = shape
    i=0

    for nn in range(n):
        for mm in range(m):
            with lock:
                array[nn][mm]  = i
            i=i 1

    print("Array after process "   str(process_number))
    print(array, flush=True)

def to_shared_array(arr, ctype):
    shared_array = mp.Array(ctype, arr.size, lock=False)
    temp = np.frombuffer(shared_array, dtype=arr.dtype)
    temp[:] = arr.flatten(order='C')
    return shared_array

def to_numpy_array(shared_array, shape):
    '''Create a numpy array backed by a shared memory Array.'''
    arr = np.ctypeslib.as_array(shared_array)
    return arr.reshape(shape)

if __name__=='__main__':

    # Start with a numpy array!
    mp_arr = np.zeros((n, m), dtype=np.int32)
    shared_array = to_shared_array(mp_arr, c.c_int32)
    # you have to now use the shared array as the base
    u = to_numpy_array(shared_array, mp_arr.shape)

    print("Array at the inital state: ")
    print(u)

    lock = mp.Lock()

    p1=mp.Process(target=addData,args=(shared_array, mp_arr.shape, lock, 1))
    p2=mp.Process(target=addData,args=(shared_array, mp_arr.shape, lock, 2))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print("Array at the final state: ")
    print(u)

Prints:

Array at the inital state:
[[0 0 0]
 [0 0 0]]
Array after process 2
[[0 1 2]
 [3 4 5]]
Array after process 1
[[ 0  2  4]
 [ 6  8 10]]
Array at the final state:
[[ 0  2  4]
 [ 6  8 10]]