Replace consecutive identic elements in the beginning of an array with 0-CodePudding

I want to replace the N first identic consecutive numbers from an array with 0.

import numpy as np


x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

Loop works, but what would be a faster-vectorized implementation?

i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
    x[i] = 0
    i  = 1

CodePudding user response：

You can use argmax on a boolean array to get the index of the first changing value.

Then slice and replace:

n = (x!=x[0]).argmax()  # 4
x[:n] = 0

output:

array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

intermediate array:

(x!=x[0])

#                            n=4
# [False False False False  True  True  True  True  True  True  True  True
#  True  True  True  True  True  True  True]

CodePudding user response：

My solution is based on itertools.groupby, so start from import itertools.

This function creates groups of consecutive equal values, contrary to e.g. the pandasonic version of groupby, which collects withis a single group all equal values from the input.

Another important feature is that you can assign any value to N and replaced will be only the first N of a sequence of consecutive values.

To test my code, I set N = 4 and defined the source array as:

x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])

Note that it contains 5 consecutive values of 2 at the end.

Then, to get the expected result, run:

rv = []
for key, grp in itertools.groupby(x):
    lst = list(grp)
    lgth = len(lst)
    if lgth >= N:
        lst[0:N] = [0] * N
    rv.extend(lst)
xNew = np.array(rv)

The result is:

[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]

Note that a sequence of 4 zeroes occurs:

at the beginning (all 4 values of 1 have been replaced),
almost at the end (from 5 values of 2 first 4 have been replaced).