I have an very long 2-D numpy array and want to count values inside intervals. I can do it using double loop, however it is very time consuming. Can anyone give me an faster alternative way? I guess it will be better with no loops.
Bellow, there is a simple code exemplifying what I want to do in a fast way.
a = np.random.random([10000, 2])
a[:, 1] = 2 # So we have the first column with values between 0. and 1.,
# and the 2nd column with values between 2. and 3.
for i in range(10):
for j in range(5):
s0 = a[a[:, 0] >= i * 0.1]
s1 = s0[s0[:, 0] < (i 1) * 0.1]
s2 = s1[s1[:, 1] >= 2 j * 0.2]
s3 = s2[s2[:, 1] < 2 (j 1) * 0.2]
print(len(s3))
Additional information: I tried using masked arrays
, but it did not work because I need to compare an array with lower and higher limits. As much as I know, masked array
only allows to compare values inside the numpy arrays with floats, but not with another array.
CodePudding user response:
The operation is inefficient because it creates a lot of temporary arrays and read/write relatively large array over and over: 4 boolean arrays 4 floating-point arrays per iteration and there are 50 iterations. This means 400 array. Not mention the array needs to be read/written completely over and over. Additionally, creating an array just to count items is not efficient either. You can just use np.count_nonzero
instead. Note that printing is slow too but I guess you will not use it in a real-world code.
Additionally, the memory access pattern is not efficient: a[:,0]
and a[:,1]
are strided views that prevent Numpy to vectorize the code. It also cause twice more data to be read from the memory hierarchy. The transposed version should be proffered (with a copy so to avoid strided views). The transposed array can be precomputed once.
Here is an improved version:
a = np.random.random([10000, 2])
a[:, 1] = 2
x, y = a.T.copy()
for i in range(10):
for j in range(5):
cond = x >= i * 0.1
cond &= x < (i 1) * 0.1
cond &= y >= 2 j * 0.2
cond &= y < 2 (j 1) * 0.2
len_s3 = np.count_nonzero(cond)
#print(len_s3)
This is about 6 times faster on my machine. Note booleans array are still created but they are much faster to create and fill since they are 8 times smaller than double-precision floating-point ones in memory. You can use functions like np.logical_and
combined with the out
parameter so to speed up a bit the computation but the impact is pretty small (most of the cost comes from the internal copy and Numpy internal overheads).
If this is not enough, you can use Numba to speed this up significantly. An alternative solution is to sort the array so to then perform a fast binary search on the sub-parts though it is a bit more tricky to do.