Home > Net >  Replace consecutive identic elements in the beginning of an array with 0
Replace consecutive identic elements in the beginning of an array with 0

Time:02-10

I want to replace the N first identic consecutive numbers from an array with 0.

import numpy as np


x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

Loop works, but what would be a faster-vectorized implementation?

i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
    x[i] = 0
    i  = 1

CodePudding user response:

You can use argmax on a boolean array to get the index of the first changing value.

Then slice and replace:

n = (x!=x[0]).argmax()  # 4
x[:n] = 0

output:

array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

intermediate array:

(x!=x[0])

#                            n=4
# [False False False False  True  True  True  True  True  True  True  True
#  True  True  True  True  True  True  True]

CodePudding user response:

My solution is based on itertools.groupby, so start from import itertools.

This function creates groups of consecutive equal values, contrary to e.g. the pandasonic version of groupby, which collects withis a single group all equal values from the input.

Another important feature is that you can assign any value to N and replaced will be only the first N of a sequence of consecutive values.

To test my code, I set N = 4 and defined the source array as:

x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])

Note that it contains 5 consecutive values of 2 at the end.

Then, to get the expected result, run:

rv = []
for key, grp in itertools.groupby(x):
    lst = list(grp)
    lgth = len(lst)
    if lgth >= N:
        lst[0:N] = [0] * N
    rv.extend(lst)
xNew = np.array(rv)

The result is:

[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]

Note that a sequence of 4 zeroes occurs:

  • at the beginning (all 4 values of 1 have been replaced),
  • almost at the end (from 5 values of 2 first 4 have been replaced).
  • Related