Modify data in shared memory when using Python's ray module-CodePudding

I am currently trying to parallelize some parts of a Python code using the ray module. Unfortunately, ray does not allow to modify the data in the shared memory by default (at least according to my understanding). This means I would need to perform a numpy.copy() first, which sounds very inefficient to me.

This is a probably very inefficient example:

import numpy as np
import ray

@ray.remote
def mod_arr( arr ):
    arr_cp  = np.copy(arr)
    arr_cp  = np.ones(arr_cp.shape)
    return arr_cp

ray.init()
arr = np.zeros( (2,3,4) )
arr = ray.get(mod_arr.remote(arr))

If I omit the np.copy() in the function mod_arr() and try to modify arr instead, I get the following error

ValueError: output array is read-only

Am I using ray completely wrong, or is it not the correct tool for my purpose?

CodePudding user response：

Because of Python's GIL, multiple threads cannot run in parallel on Python. Therefore all true parallelism is achieved either outside of Python when a module releases GIL, or by using multiprocessing.

In multiprocessing, this memory copy is a normal process. Not only there, but actually in pure functional programming, where arguments to functions are immutable, the solution is to always copy memory when you have to. It has a lot of advantages in stability, while paying an acceptable performance penalty.

Basically, treat these functions as pure functions.