Python Run function in parallel-CodePudding

Is there a way to speed this up by running in parallel? The longest process time is scipy.ndimage.map_coordinates.

import multiprocessing
import numpy as np
import scipy.ndimage

pool = multiprocessing.Pool()
n=6
x0=350
y0=350
r=150
num=10000
#z = np.gradient(sensor_dat, axis=1)
z = np.random.randn(700,700)

def func1(i):
    x1, y1 = x0   r * np.cos(2 * np.pi * i / n), y0   r * np.sin(2 * np.pi * i / n)
    x, y = np.linspace(x0, x1, num), np.linspace(y0, y1, num)
    zi = scipy.ndimage.map_coordinates(z, np.vstack((y, x)))
    return zi

[func4(i) for i in range(36)]
#pool.map(func1,range(36))

I tried from Is there a simple process-based parallel map for python? to use pool.map(func1,range(36)) but got error Can't pickle <function func1 at 0x0000019408E6F438>: attribute lookup func1 on __main__ failed

I found How to accelerate scipy.map_coordinates for multiple interpolations? but dont think this is relevent as scipy.ndimage.map_coordinates is the majority of process time, but dont think it will speed up in my case.

CodePudding user response：

Yes you can. Just follow the instruction within the multiprocessing documentation and measure whether it will actually be faster using multiple workers.

Here is the code that I tested with:

from multiprocessing import Pool
import numpy as np
from scipy import ndimage
from time import time

n=6
x0=350
y0=350
r=150
num=10000
#z = np.gradient(sensor_dat, axis=1)
z = np.random.randn(700,700)

def f(i):
    x1, y1 = x0   r * np.cos(2 * np.pi * i / n), y0   r * np.sin(2 * np.pi * i / n)
    x, y = np.linspace(x0, x1, num), np.linspace(y0, y1, num)
    zi = ndimage.map_coordinates(z, np.vstack((y, x)))
    return zi

if __name__ == '__main__':
    begin = time()
    [f(i) for i in range(36)]
    end = time()
    print('Single worker took {:.3f} secs'.format(end - begin))

    begin = time()
    with Pool() as p:
        p.map(f, list(range(36)))
    end = time()
    print('Parallel workers took {:.3f} secs'.format(end - begin))

This yields the following output on my machine:

Single worker took 0.793 secs
Parallel workers took 0.217 secs