Can someone help me make this faster?
import numpy as np
# an array to split
a = np.array([0,0,1,0,1,1,1,0,1,1,0,0,0,1])
# idx where the number changes
idx = np.where(np.roll(a,1)!=a)[0][1:]
# split of array into groups
aout = np.split(a,idx)
# sum of each group
sumseg = [aa.sum() for aa in aout]
#fill criteria
idx2 = np.where( (np.array(sumseg)>0) & (np.array(sumseg)<2) )
#fill targets
[aout[ai].fill(0) for ai in idx2[0]]
# a is now updated? didn't follow how a gets updated
# return a
I noticed that a
gets updated through this process, but didn't understand how those objects remained link thought the splitting etc...
If it is important, or helps, a
is actually a 2d array and I am looping over each row/column performing this operation.
CodePudding user response:
Better solution 1D:
We can use a convolution:
aout = ((np.convolve(a,[1,1,1],mode='same')>1)&(a>0)).astype(a.dtype)
# aout = array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0])
Better solution 2D:
from scipy.signal import convolve2d
a = np.array([[1, 1, 0, 0, 0, 0, 1, 0, 0, 1],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 1]])
aout = ((convolve2d(a,np.ones((1,3)),mode='same')>1)&(a>0)).astype(a.dtype)
#aout = array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]])
Why a
changed ?
And to understand why a
is updated in your process, you need to understand the difference between a copy and a view.
From the documentation:
View
It is possible to access the array differently by just changing certain metadata like stride and dtype without changing the data buffer. This creates a new way of looking at the data and these new arrays are called views. The data buffer remains the same, so any changes made to a view reflects in the original copy. A view can be forced through the ndarray.view method.
Copy
When a new array is created by duplicating the data buffer as well as the metadata, it is called a copy. Changes made to the copy do not reflect on the original array. Making a copy is slower and memory-consuming but sometimes necessary. A copy can be forced by using ndarray.copy.
Or np.split()
return a view not a copy of a
, so aout
is still pointing to the same data buffer as a
, if you change aout
you change a
.
Benchmarking
import numpy as np
a = np.random.randint(0,2,(1000000,))
def continuous_split(a):
idx = np.where(np.roll(a,1)!=a)[0][1:]
aout = np.split(a,idx)
sumseg = [aa.sum() for aa in aout]
idx2 = np.where( (np.array(sumseg)>0) & (np.array(sumseg)<2) )
[aout[ai].fill(0) for ai in idx2[0]]
return aout
def continuous_conv(a):
return ((np.convolve(a,[1,1,1],mode='same')>1)&(a>0)).astype(a.dtype)
%timeit continuous_split(a)
%timeit continuous_conv(a)
np.split() solution:
668 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.convolve() solution:
7.63 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)