Run two nested for loops in parallel to create matrix-CodePudding

I've written a method that takes in an integer "n" and creates a square matrix where the values of each element are dictated by their respective i,j indices.

When I build a small matrix 30x30 it works just fine, but when I try to do something larger like 1000x1000 it takes very long. Is there any way that I can speed it up with multiprocessing?

def createMatrix(n):
    matrix = []
    for j in range(1,n 1):
        row = []
        for i in range(1,n 1):
            value = 1/(i j-1)
            row.append(value)
        matrix.append(row)
    return np.array(matrix)

CodePudding user response：

Parallelizing two computation-bound for loops in Python is not trivial because of GIL. The good news is that your case is perfectly vectorizeable:

def createMatrix(n):
    return 1 / (np.arange(n)[None, :]   np.arange(n)[:, None]   1)

Explanation:

essentially, your formula for the matrix is X[row][column] = 1/(row column-1), where rows and columns are 1-based
np.arange(n) creates a range that can be used for rows or columns
[None, :] and [:, None] turn it into a 2d array, 1 x n or n x 1
numpy then broadcasts dimensions, replicating row and column indexes to match dimensions - thus, implicitly tiling both into n x n when added
since both ranges are 0-based, using 1 instead of -1

As a rule of thumb, it is almost never a good idea to use for loops on numpy arrays. A vectorized approach (i.e. matrix form computations) is orders of magnitude faster.

CodePudding user response：

Your code can be written in another style, accelerated by numba library in a parallel no python mode:

import numba as nb

@nb.njit("float64[:, ::1](int64)", parallel=True, fastmath=True)
def createMatrix(n):
    matrix = np.zeros((n, n))
    for j in nb.prange(1, n   1):
        for i in range(1, n   1):
            matrix[j - 1, i - 1] = 1 / (i   j - 1)
    return matrix

This solution will be faster than the Marat answer above 2 times for n = 1000 (Benchmark)

CodePudding user response：

It's not a good idea to use fors to fill a list then convert it to a matrix. the operation that you have can be vectorized with numpy from scratch. if you think that given the i,j, M(i,j) = 1/(j i-1) considering that both indices starts at 1.

Here's my proposal :

def createMatrix2(n):
    arr =np.arange(1,n 1)
    xx,yy = np.meshgrid(arr,arr)
    matrix = 1/(xx yy-1)
    return matrix

looking at Marat answer, I think his/her it's better, so tested the 3 methods:

EDIT: added wwii method as createMatrix4 (correcting the errors):

import numpy as np
from time import time

def createMatrix1(n):
    matrix = []
    for j in range(1,n 1):
        row = []
        for i in range(1,n 1):
            value = 1/(i j-1)
            row.append(value)
        matrix.append(row)
    return np.array(matrix)

def createMatrix2(n):
    arr =np.arange(1,n 1)
    xx,yy = np.meshgrid(arr,arr)
    matrix = 1/(xx yy-1)
    return matrix

def createMatrix3(n):
    """Marat's proposed matrix"""
    return 1 / (1   np.arange(n)[None, :]   np.arange(n)[:, None])

def createMatrix4(n):
    """ wwii method"""
    i,j = np.ogrid[1:n,1:n]
    return 1/(i j-1)
#test all the three methods
n = 10000

t1 = time()
m1 = createMatrix1(n)
t2 = time()
m2 = createMatrix2(n)
t3 = time()
m3 = createMatrix3(n)
t4 = time()
m4 = createMatrix4(n)
t5 = time()
print(np.allclose(m1,m2))
print(np.allclose(m1,m3))
print(np.allclose(m1,m4))
print("Matrix 1 (OP): ",t2-t1)
print("Matrix 2: (mine)",t3-t2)
print("Matrix 3: (Marat)",t4-t3)
print("Matrix 4: (wwii)",t5-t4)
# the output is:
#True
#True
#True
#Matrix 1 (OP):  18.4886577129364
#Matrix 2: (mine) 1.005324363708496
#Matrix 3: (Marat) 0.43033909797668457
#Matrix 4: (wwii) 0.5138359069824219

So Marat's solution is faster. As general comments:

Try to avoid fors loops
Think your problem as operation with indices and dessing operations with numpy arrays directly.

For last, given Marat's answer I thought my proposal is a easier to read, and understand. But it's just a subjective view