Nonzero for integers-CodePudding

My problem is as follows. I am generating a random bitstring of size n, and need to iterate over the indices for which the random bit is 1. For example, if my random bitstring ends up being 00101, I want to retrieve [2, 4] (on which I will iterate over). The goal is to do so in the fastest way possible with Python/NumPy.

One of the fast methods is to use NumPy and do

bitstring = np.random.randint(2, size=(n,))
l = np.nonzero(bitstring)[0]

The advantage with np.non_zero is that it finds indices of bits set to 1 much faster than if one iterates (with a for loop) over each bit and checks if it is set to 1.

Now, NumPy can generate a random bitstring faster via np.random.bit_generator.randbits(n). The problem is that it returns it as an integer, on which I cannot use np.nonzero anymore. I saw that for integers one can get the count of bits set to 1 in an integer x by using x.bit_count(), however there is no function to get the indices where bits are set to 1. So currently, I have to resort to a slow for loop, hence losing the initial speedup given by np.random.bit_generator.randbits(n).

How would you do something similar to (and as fast as) np.non_zero, but on integers instead?

Thank you in advance for your suggestions!

CodePudding user response：

you could convert the number you get with randbits(n) to a numpy.ndarray. depending on the size of n the compute time of the conversion should be faster than the loop.

n = 10
l = np.random.bit_generator.randbits(n) # gives you the int 616
l_string = f'{l:0{n}b}' # gives you a string representation of the int in length n 1001101000
l_nparray = np.array(list(l_string), dtype=int) # gives you the numpy.ndarray like np.random.randint [1 0 0 1 1 0 1 0 0 0]

CodePudding user response：

A minor optimisation to your code would be to use the new style random interface and generate bools rather than 64bit integers

rng = np.random.default_rng()

def original(n):
    bitstring = rng.integers(2, size=n, dtype=bool)
    return np.nonzero(bitstring)[0]

this causes it to take ~24 µs on my laptop, tested n upto 128.

I've previously noticed that getting a Numpy to generate a permutation is particularly fast, hence my comment above. Leading to:

def perm(n):
    a = rng.permutation(n)
    return a[:rng.binomial(n, 0.5)]

which takes between ~7 µs and ~10 µs depending on n. It also returns the indicies out of order, not sure if that's an issue for you. If your n isn't changing much, you could also swap to using rng.shuffle on an pre-allocated array, something like:

n = 32
a = np.arange(n)

def shuffle():
    rng.shuffle(a)
    return a[:rng.binomial(n, 0.5)]

which saves a couple of microseconds.