Home > Blockchain >  Modify data in shared memory when using Python's ray module
Modify data in shared memory when using Python's ray module

Time:12-03

I am currently trying to parallelize some parts of a Python code using the ray module. Unfortunately, ray does not allow to modify the data in the shared memory by default (at least according to my understanding). This means I would need to perform a numpy.copy() first, which sounds very inefficient to me.

This is a probably very inefficient example:

import numpy as np
import ray

@ray.remote
def mod_arr( arr ):
    arr_cp  = np.copy(arr)
    arr_cp  = np.ones(arr_cp.shape)
    return arr_cp

ray.init()
arr = np.zeros( (2,3,4) )
arr = ray.get(mod_arr.remote(arr))

If I omit the np.copy() in the function mod_arr() and try to modify arr instead, I get the following error

ValueError: output array is read-only

Am I using ray completely wrong, or is it not the correct tool for my purpose?

CodePudding user response:

Because of Python's GIL, multiple threads cannot run in parallel on Python. Therefore all true parallelism is achieved either outside of Python when a module releases GIL, or by using multiprocessing.

In multiprocessing, this memory copy is a normal process. Not only there, but actually in pure functional programming, where arguments to functions are immutable, the solution is to always copy memory when you have to. It has a lot of advantages in stability, while paying an acceptable performance penalty.

Basically, treat these functions as pure functions.

  • Related