I'm trying to calculate this type of calculation:
arr = np.arange(4)
# array([0, 1, 2, 3])
arr_t =arr.reshape((-1,1))
# array([[0],
# [1],
# [2],
# [3]])
mult_arr = np.multiply(arr,arr_t) # <<< the multiplication
# array([[0, 0, 0, 0],
# [0, 1, 2, 3],
# [0, 2, 4, 6],
# [0, 3, 6, 9]])
to eventually perform it in a bigger matrix index of single row, and to sum all the matrices that are reproduced by the calculation:
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
summed
Till now its all pure awesomeness, the problem starts when I try to covert on a bigger dataset, for example this array instead :
arr = np.random.random((6000,1500))
I get the following error - MemoryError: Unable to allocate 101. GiB for an array with shape (6000, 1500, 1500) and data type float64
which make sense, but my question is:
can I get around this anyhow without being forced to use loops that slow down the process entirely ??
my question is mainly about performance and solution that require long running tasks more then 30 secs is not an option.
CodePudding user response:
Looks like you are simply trying to perform a dot product:
arr.T@arr
or
arr.T.dot(arr)
checking this is what you want
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
np.allclose((arr.T@arr), summed)
# True