Home > Software engineering >  replace repeated values with counting up values in Numpy (vectorized)
replace repeated values with counting up values in Numpy (vectorized)

Time:04-21

I have an array of repeated values that are used to match datapoints to some ID. How can I replace the IDs with counting up index values in a vectorized manner?

Consider the following minimal example:

import numpy as np

n_samples = 10

ids = np.random.randint(0,500, n_samples)
lengths = np.random.randint(1,5, n_samples)

x = np.repeat(ids, lengths)
print(x)

Output:

[129 129 129 129 173 173 173 207 207   5 430 147 143 256 256 256 256 230 230  68]

Desired solution:

indices = np.arange(n_samples)
y = np.repeat(indices, lengths)
print(y)

Output:

[0 0 0 0 1 1 1 2 2 3 4 5 6 7 7 7 7 8 8 9]

However, in the real code, I do not have access to variables like ids and lengths, but only x.

It does not matter what the values in x are, I just want an array with counting up integers which are repeated the same amount as in x.

I can come up with solutions using for-loops or np.unique, but both are too slow for my use case.

Has anyone an idea for a fast algorithm that takes an array like x and returns an array like y?

CodePudding user response:

You can do:

y = np.r_[False, x[1:] != x[:-1]].cumsum()

Or with one less temporary array:

y = np.empty(len(x), int)
y[0] = 0
np.cumsum(x[1:] != x[:-1], out=y[1:])
print(y)
  • Related