I have the numpy array
arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
[1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
[8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])
I need the result array like this:
[[0, 0, 0, 0, 7, 0, 0, 0, 9, 0, 3]
[0, 0, 3, 0, 0, 0, 0, 0, 12, 0, 0]
[0, 0, 0, 0, 25, 0, 1, 0, 0, 0, 0]]
What's happened?
We go along the row, if element in row is 0, then we go to the next element , if not 0, then we sum up the elements until 0 is met, once 0 is met, then we replace it with the resulting sum (also replace the initial non-zero numbers with 0
I already know how to do that with loops but it doesn't work well on time for a large number of rows, so I need time-efficient solution in numpy methods. I tried cumsum and some logical methods
upd.
Here is the attempt
zero_el = arr == 0
np.where(zero_el, arr.cumsum(axis=1), 0)
> [[ 0 0 0 0 7 7 0 0 16 0 19]
[ 0 0 3 3 3 3 0 0 15 15 15]
[ 0 0 0 0 25 0 26 26 26 26 0]]
CodePudding user response:
First, we want to find the locations where the array has a zero next to a non-zero.
rr, cc = np.where((arr[:, 1:] == 0) & (arr[:, :-1] != 0))
Now, we can use np.add.reduceat
to add elements. Unfortunately, reduceat
needs a list of 1-d indices, so we're going to have to play with shapes a little. Calculating the equivalent indices of rr, cc
in a flattened array is easy:
reduce_indices = rr * arr.shape[1] cc 1
# array([ 4, 8, 10, 13, 19, 26, 28])
We want to reduce from the start of every row, so we'll create a row_starts
to mix in with the indices calculated above:
row_starts = np.arange(arr.shape[0]) * arr.shape[1]
# array([ 0, 11, 22])
reduce_indices = np.hstack((row_starts, reduce_indices))
reduce_indices.sort()
# array([ 0, 4, 8, 10, 11, 13, 19, 22, 26, 28])
Now, call np.add.reduceat
on the flattened input array, reducing at reduce_indices
totals = np.add.reduceat(arr.flatten(), reduce_indices)
# array([ 7, 9, 3, 0, 3, 12, 0, 25, 1, 1])
Now we have the totals, we need to assign them to an array of zeros. Note that the 0
th element of totals
needs to go to the 1
th index of reduce_indices
, and the last element of totals
is to be discarded:
result_f = np.zeros((arr.size,))
result_f[reduce_indices[1:]] = totals[:-1]
result = result_f.reshape(arr.shape)
which gives the expected result:
array([[ 0., 0., 0., 0., 7., 0., 0., 0., 9., 0., 3.],
[ 0., 0., 3., 0., 0., 0., 0., 0., 12., 0., 0.],
[ 0., 0., 0., 0., 25., 0., 1., 0., 0., 0., 0.]])
CodePudding user response:
We can solve using 2 for loops. In every row we will define current_sum and if number is zero we assign current_sum to number and reset current_sum; if number is not zero we assign 0 to number and we increment current_sum.
Edit: Sorry first i didn't realize you want an efficient solution. We can use numba to accelerate for loops. it is really simple and powerful. Here is the code:
import numpy as np
import numba
arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
[1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
[8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])
@numba.jit(nopython=True)
def mySum(array):
for i in range(array.shape[0]):
current_sum = 0
for j in range(array.shape[1]):
if array[i,j] == 0:
array[i,j] = current_sum
current_sum = 0
else:
current_sum = array[i,j]
array[i,j] = 0
return array
print(mySum(arr))
function is slow in first run because it understands input and function and creates machine code, but after that it is really fast. I hope it is fast enough for your case.
CodePudding user response:
Maybe longer than in loop... But let me demonstrate with single array:
a = np.array([0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0])
zero_index = np.where(a == 0)[0]
# Split zeros, sum each slice, drop the last one
replace_arr = np.array(list(map(sum, np.split(a, zero_index))))[:-1]
output = np.zeros(11)
# Put sum data into zeros array
np.put_along_axis(output, zero_index, replace_arr, axis=0)
output