I want to replace the N first identic consecutive numbers from an array with 0
.
import numpy as np
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
Loop works, but what would be a faster-vectorized implementation?
i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
x[i] = 0
i = 1
CodePudding user response:
You can use argmax
on a boolean array to get the index of the first changing value.
Then slice and replace:
n = (x!=x[0]).argmax() # 4
x[:n] = 0
output:
array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
intermediate array:
(x!=x[0])
# n=4
# [False False False False True True True True True True True True
# True True True True True True True]
CodePudding user response:
My solution is based on itertools.groupby, so start from import itertools
.
This function creates groups of consecutive equal values, contrary to e.g. the pandasonic version of groupby, which collects withis a single group all equal values from the input.
Another important feature is that you can assign any value to N and replaced will be only the first N of a sequence of consecutive values.
To test my code, I set N = 4
and defined the source array as:
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])
Note that it contains 5 consecutive values of 2 at the end.
Then, to get the expected result, run:
rv = []
for key, grp in itertools.groupby(x):
lst = list(grp)
lgth = len(lst)
if lgth >= N:
lst[0:N] = [0] * N
rv.extend(lst)
xNew = np.array(rv)
The result is:
[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]
Note that a sequence of 4 zeroes occurs:
- at the beginning (all 4 values of 1 have been replaced),
- almost at the end (from 5 values of 2 first 4 have been replaced).