I have a numpy array with only -1, 1 and 0, like this:
np.array([1,1,-1,-1,0,-1,1])
I would like a new array that counts the -1 encountered. The counter must reset when a 0 appears and remain the same when it's a 1:
Desired output:
np.array([0,0,1,2,0,1,1])
The solution must be very little time consuming when used with larger array (up to 100 000)
Edit: Thanks for your contribution, I've a working solution for now.
I'm still looking for a non-iterative way to solve it (no for
loop). Maybe with a pandas Series and the cumsum()
method ?
CodePudding user response:
Maybe with a pandas Series and the
cumsum()
method?
Yes, use Series.cumsum
and Series.groupby
:
s = pd.Series([1, 1, -1, -1, 0, -1, 1])
s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])
Step-by-step
Create pseudo-groups that reset when equal to 0:
groups = s.eq(0).cumsum() # array([0, 0, 0, 0, 1, 1, 1])
Then
groupby
these pseudo-groups andcumsum
when equal to -1:s.eq(-1).groupby(groups).cumsum().to_numpy() # array([0, 0, 1, 2, 0, 1, 1])
Timings
not time consuming when used with larger array (up to 100,000)
groupby
cumsum
is ~8x faster than looping, given np.random.choice([-1, 0, 1], size=100_000)
:
%timeit series_cumsum(a)
# 3.29 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit miki_loop(a)
# 26.5 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit skyrider_loop(a)
# 26.8 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
CodePudding user response:
I seem to get a 10x speedup over Pandas solution with numba for this benchmark:
from numba import jit
inp1 = np.array([1,1,-1,-1,0,-1,1], dtype=int)
inp2 = np.random.randint(-1, 10, size=10**6)
@jit
def with_numba(arr):
val = 0
put = np.zeros_like(arr)
for i in range(arr.size):
if arr[i] == -1:
val = 1
elif arr[i] == 0:
val = 0
put[i] = val
return put
def with_pandas(inp):
s = pd.Series(inp)
return s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
assert (with_numba(inp1) == with_pandas(inp1)).all()
assert (with_numba(inp2) == with_pandas(inp2)).all()
%timeit with_numba(inp2)
# 100 loops, best of 5: 4.57 ms per loop
%timeit with_pandas(inp2)
# 10 loops, best of 5: 46.3 ms per loop
CodePudding user response:
Use a for
loop.
Set a variable which starts at 1
and reset it each time you encounter a different number. For example:
counter = 1;
outputArray = [];
for number in npArray:
if number == -1:
outputArray.append(counter)
counter = 1
elif number == 1:
outputArray.append(0)
else:
outputArray.append(0)
counter = 1
print(outputArray)
CodePudding user response:
Here is a fix for @skyrider's code
npArray = [1,1,-1,-1,0,-1,1]
counter = 0
outputArray = []
for number in npArray:
if number == -1:
counter = 1
outputArray.append(counter)
elif number == 0:
outputArray.append(0)
counter = 0
else:
outputArray.append(counter)
print(outputArray)
CodePudding user response:
Yet another possibility, but uses np.split to split the array on 0 and cumulatively counts -1 (uses for-loop to cumsum
, so nowhere near as efficient as @tdy's answer):
out = np.concatenate([(ary==-1).cumsum() for ary in np.split(arr, np.where(arr== 0)[0])])
CodePudding user response:
Let's first save your numpy array in a variable:
a = np.array([1,1,-1,-1,0,-1,1])
I define a variabel, count
to hold the value you care about, and set it to be zero.
Then I define a list to hold the new elements. Let's call it l
.
Then I iterate on elemnts of a and in each ieration I name the element i
.
Inside each iteration, I implement the logic:
- if
i
is -1, then increasecounter
- else, if
i
is 0, reset thecounter
- and do nothing otherwise
And finally, I append the
counter
tol
. Lastly, convertl
to be a numpy array,out
.
l = []
count = 0
for i in a:
if i == -1:
count =1
elif i==0:
count = 0
l.append(count)
out = np.array(l)
out