Indexed sum of numpy array?-CodePudding

I would like to generate a numpy array by performing a sum of indexed values from another array

For for example, given the following arrays:

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

I would like so set a new array indexed_sum in the following way:

for i in range(row_indices.size):
    indexed_sum[row_indices.flat[i], col_indices.flat[i]]  = values.flat[i]

Such that:

indexed_sum = np.array([[0, 2], [4, 11]])

However, since this is a python loop and these arrays can be very large, this takes an unacceptable amount of time. Is there an efficient numpy method that I can use to accomplish this?

CodePudding user response：

Since the tradeoff between speed and memmory-ussage I think your method is well situable. But you can still make it faster:

avoid flattening the arrays inside the loop this will save you some time
instead of using .flat[:] or .flatten() use .ravel() (I'm not sure why but it seems to be faster)
also avoid for i in range.. just zip in the values of interest (see method3)

Here a good solution that will speed-up things:

    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max() 1,col_indices.max() 1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j]  = v
    return indexed_sum

To see a comparision here's some toy code (correct any detail it's not proportioned and let me know if it works well for you)




def method1(values,row_indices,col_indices):
    """OP method"""
    indexed_sum = np.zeros((row_indices.max() 1,col_indices.max() 1))
    for i in range(row_indices.size):
        indexed_sum[row_indices.flat[i], col_indices.flat[i]]  = values.flat[i]
    return indexed_sum

def method2(values,row_indices,col_indices):
    """just raveling before loop. Time saved here is considerable"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max() 1,col_indices.max() 1))
    for i in range(row_indices.size):
        indexed_sum[r_i_f[i], c_i_f[i]]  = v_f[i]
    return indexed_sum

def method3(values,row_indices,col_indices):
    """raveling, then avoiding range(...), just zipping
    the time saved here is small but by no cost"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max() 1,col_indices.max() 1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j]  = v
    return indexed_sum




from time import perf_counter
import numpy as np


out_size = 50
in_shape = (5000,5000)

values = np.random.randint(10,size=in_shape)
row_indices = np.random.randint(out_size,size=in_shape)
col_indices = np.random.randint(out_size,size=in_shape)


t1 = perf_counter()
v1 = method1(values,row_indices,col_indices)
t2 = perf_counter()
v2 = method2(values,row_indices,col_indices)
t3 = perf_counter()
v3 = method3(values,row_indices,col_indices)
t4 = perf_counter()

print(f"method1: {t2-t1}")
print(f"method2: {t3-t2}")
print(f"method3: {t4-t3}")

Outputs for values of shape 5000x5000 and output shaped as 50x50:

method1: 23.66934896100429
method2: 14.241692076990148
method3: 11.415708078013267

aditional a comparison between fltten methods (in my computer)

q = np.random.randn(5000,5000)
t1 = perf_counter()
q1 = q.flatten()
t2 = perf_counter()
q2 = q.ravel()
t3 = perf_counter()
q3 = q.reshape(-1)
t4 = perf_counter()
q4 = q.flat[:]
t5 = perf_counter()
#print times:
print(f"q.flatten: {t2-t1}")
print(f"q.ravel: {t3-t2}")
print(f"q.reshape(-1): {t4-t3}")
print(f"q.flat[:]: {t5-t4}")

Outputs:

q.flatten: 0.043878231997950934
q.ravel: 5.550700007006526e-05
q.reshape(-1): 0.0006349250033963472
q.flat[:]: 0.08832104799512308

CodePudding user response：

You might find success with numba, another Python package. I timed the following two functions in a Jupyter notebook with %%timeit. Results are below:

# Your loop, but in a function.
def run_sum(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max()   1, col_indicies.max()   1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]]  = values.flat[i]
    return indexed_sum


# Your loop with a numba decorator.
@numba.jit(nopython=True)  # note you may be able to parallelize too
def run_sum_numba(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max()   1, col_indicies.max()   1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]]  = values.flat[i]
    return indexed_sum

My example data to have something bigger to chew on:

row_id_big = np.random.randint(0, 100, size=(1000,))
col_id_big = np.random.randint(0, 100, size=(1000,))
values_big = np.random.randint(0, 10, size=(1000,))

Results:

%timeit run_sum(row_id_big, col_id_big, values_big)
# 1.04 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit run_sum_numba(row_id_big, col_id_big, values_big)
# 3.85 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The loop with the numba decorator is a couple hundred times faster in this example. I'm not positive about the memory usage compared to your example. I had to initialize a numpy array to have somewhere to put the data, but if you have a better way of doing that step you might be able to improve performance further.

A note with numba: you need to run your loop once to start seeing the major speed improvements. You might be able to initialize the jit with just a toy example like yours here and see the same speedup.

CodePudding user response：

There's a lot of options for this, but they're all reinventing a wheel that kinda already exists.

import numpy as np
from scipy import sparse

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

What you want is the built-in behavior for the scipy sparse matrices:

arr = sparse.coo_matrix((values.flat, (row_indices.flat, col_indices.flat)))

Which yields a sparse data structure:

>>> arr
<2x2 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in COOrdinate format>

But you can convert it back to a numpy array easily:

>>> arr.A
array([[ 0,  2],
       [ 4, 11]])