Numpy array counter with reset-CodePudding

I have a numpy array with only -1, 1 and 0, like this:

np.array([1,1,-1,-1,0,-1,1])

I would like a new array that counts the -1 encountered. The counter must reset when a 0 appears and remain the same when it's a 1:

Desired output:

np.array([0,0,1,2,0,1,1])

The solution must be very little time consuming when used with larger array (up to 100 000)

Edit: Thanks for your contribution, I've a working solution for now.

I'm still looking for a non-iterative way to solve it (no for loop). Maybe with a pandas Series and the cumsum() method ?

CodePudding user response：

Maybe with a pandas Series and the cumsum() method?

Yes, use Series.cumsum and Series.groupby:

s = pd.Series([1, 1, -1, -1, 0, -1, 1])

s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])

Step-by-step

Create pseudo-groups that reset when equal to 0:

groups = s.eq(0).cumsum()
# array([0, 0, 0, 0, 1, 1, 1])

Then groupby these pseudo-groups and cumsum when equal to -1:

s.eq(-1).groupby(groups).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])

Timings

not time consuming when used with larger array (up to 100,000)

groupby cumsum is ~8x faster than looping, given np.random.choice([-1, 0, 1], size=100_000):

%timeit series_cumsum(a)
# 3.29 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit miki_loop(a)
# 26.5 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit skyrider_loop(a)
# 26.8 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response：

I seem to get a 10x speedup over Pandas solution with numba for this benchmark:

from numba import jit

inp1 = np.array([1,1,-1,-1,0,-1,1], dtype=int)
inp2 = np.random.randint(-1, 10, size=10**6)

@jit
def with_numba(arr):
  val = 0
  put = np.zeros_like(arr)
  for i in range(arr.size):
    if arr[i] == -1:
      val  = 1
    elif arr[i] == 0:
      val = 0
    put[i] = val

  return put

def with_pandas(inp):
  s = pd.Series(inp)
  return s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
  
assert (with_numba(inp1) == with_pandas(inp1)).all()
assert (with_numba(inp2) == with_pandas(inp2)).all()

%timeit with_numba(inp2)
# 100 loops, best of 5: 4.57 ms per loop
%timeit with_pandas(inp2)
# 10 loops, best of 5: 46.3 ms per loop

CodePudding user response：

Use a for loop. Set a variable which starts at 1 and reset it each time you encounter a different number. For example:

counter = 1;
outputArray = [];
for number in npArray:
    if number == -1:
        outputArray.append(counter)
        counter  = 1
    elif number == 1:
        outputArray.append(0)
    else:
        outputArray.append(0)
        counter = 1
print(outputArray)

CodePudding user response：

Here is a fix for @skyrider's code

npArray = [1,1,-1,-1,0,-1,1]
counter = 0
outputArray = []
for number in npArray:
    if number == -1:
        counter  = 1
        outputArray.append(counter)
    elif number == 0:
        outputArray.append(0)
        counter = 0
    else:
        outputArray.append(counter)
print(outputArray)

CodePudding user response：

Yet another possibility, but uses np.split to split the array on 0 and cumulatively counts -1 (uses for-loop to cumsum, so nowhere near as efficient as @tdy's answer):

out = np.concatenate([(ary==-1).cumsum() for ary in np.split(arr, np.where(arr== 0)[0])])

CodePudding user response：

Let's first save your numpy array in a variable:

a = np.array([1,1,-1,-1,0,-1,1])

I define a variabel, count to hold the value you care about, and set it to be zero. Then I define a list to hold the new elements. Let's call it l. Then I iterate on elemnts of a and in each ieration I name the element i. Inside each iteration, I implement the logic:

if i is -1, then increase counter
else, if i is 0, reset the counter
and do nothing otherwise And finally, I append the counter to l. Lastly, convert l to be a numpy array, out.

l = []
count = 0
for i in a:
    if i == -1:
        count =1
    elif i==0: 
        count = 0
    l.append(count)
out = np.array(l)
out