Getting random integer without 3 set bits in a row-CodePudding

Is there a performant way to generate an unbiased 64b random integer without 3 set bits in a row, assuming a fast-and-unbiased input PRNG? I don't care about 'wasting bits' of the input source.

That is, something better than the naive rejection-sampling approach:

uint64_t r;
do {
    r = get_rand_64();
} while (r & (r >> 1) & (r >> 2));

...which "works", but is very slow. It looks like it's iterating ~187x on average or so.

One possibility I've explored is roughly:

bool p2 = get_rand_bit();
bool p1 = get_rand_bit();
uint64_t r = (p1 << 1) | p2;
for (int i = 2; i < 64; i  ) {
    bool p0 = (p1 && p2) ? false : get_rand_bit();
    r |= p0 << i;
    p2 = p1;
    p1 = p0;
}

...however, this is still slow. Mainly because using this approach the entire calculation is bit-serial. EDIT: and it's also biased. Easiest to see with a 3-bit integer - 0b011 occurs 1/8th of the time, which is wrong (should be 1/7th).

I've tried doing various parallel fixups, but haven't been able to come up with anything unbiased. It's useful to play around with 4-bit integers first - e.g. setting all bits involved in a conflict to random values ends up biased, and drawing out the Markov chain for 4 bits makes that obvious

Is there a better way to do this?

CodePudding user response：

The idea behind the code below is to generate the upper 32 bits with the proper (non-uniform!) distribution, then generate the lower 32 conditional on the upper. On my laptop, it’s significantly faster than the baseline.

You can see the logic behind the non-uniform upper distribution with 4-bit outputs: 00 and 10 have four 2-bit lowers, 01 has three lowers, and 11 has two lowers.

#include <cstdint>
#include <random>

namespace {

template <typename T, typename URBG> T GenerateBaseline(URBG &gen) {
  T r;
  do {
    r = std::uniform_int_distribution<T>{std::numeric_limits<T>::min(),
                                         std::numeric_limits<T>::max()}(gen);
  } while (r & (r >> 1) & (r >> 2));
  return r;
}

constexpr std::uint64_t Tribonacci(int n) {
  std::uint64_t a = 1;
  std::uint64_t b = 0;
  std::uint64_t c = 0;
  for (int i = 0; i < n;   i) {
    std::uint64_t sum = a   b   c;
    c = b;
    b = a;
    a = sum;
  }
  return a;
}

template <typename URBG> std::uint32_t GenerateUpper(URBG &gen) {
  for (;;) {
    auto upper = GenerateBaseline<std::uint32_t>(gen);
    switch (upper & 3) {
    case 0:
    case 2:
      return upper;
    case 1:
      if (std::uniform_int_distribution<std::uint32_t>{1, Tribonacci(32)}(
              gen) <= Tribonacci(32 - 1)   Tribonacci(32 - 2)) {
        return upper;
      }
      break;
    case 3:
      if (std::uniform_int_distribution<std::uint32_t>{1, Tribonacci(32)}(
              gen) <= Tribonacci(32 - 1)) {
        return upper;
      }
      break;
    }
  }
}

template <typename URBG> std::uint64_t Generate(URBG &gen) {
  auto upper = std::uint64_t{GenerateUpper(gen)} << 32;
  std::uint64_t r;
  do {
    r = upper   std::uniform_int_distribution<std::uint32_t>{
                    std::numeric_limits<std::uint32_t>::min(),
                    std::numeric_limits<std::uint32_t>::max()}(gen);
  } while (r & (r >> 1) & (r >> 2));
  return r;
}

} // namespace

int main() {
  std::mt19937 gen{std::random_device{}()};
  for (std::int32_t i = 0; i < 100000; i  ) {
    if (false) {
      GenerateBaseline<std::uint64_t>(gen);
    } else {
      Generate(gen);
    }
  }
}

CodePudding user response：

From @John Coleman's comment, here's the start of an approach based on Tribonacci numbers. Basic idea:

Generate an unbiased number in the range [0..T(bits)), where T(0) = 1, T(1) = 2, T(2) = 4, T(n) = T(n-1) T(n-2) T(n-3).
Convert to Tribonacci representation.
You're done.

A minimal example is as follows:

// 1, 2, 4, TRIBO[n-3] TRIBO[n-2] TRIBO[n-1]
// possible minor perf optimization: reverse TRIBO
static const uint64_t TRIBO[65] = {1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927, 1705, 3136, 5768, 10609, 19513, 35890, 66012, 121415, 223317, 410744, 755476, 1389537, 2555757, 4700770, 8646064, 15902591, 29249425, 53798080, 98950096, 181997601, 334745777, 615693474, 1132436852, 2082876103, 3831006429, 7046319384, 12960201916, 23837527729, 43844049029, 80641778674, 148323355432, 272809183135, 501774317241, 922906855808, 1697490356184, 3122171529233, 5742568741225, 10562230626642, 19426970897100, 35731770264967, 65720971788709, 120879712950776, 222332455004452, 408933139743937, 752145307699165, 1383410902447554, 2544489349890656, 4680045560037375, 8607945812375585, 15832480722303616, 29120472094716576, 53560898629395777, 98513851446415969];

// exclusive of max
extern uint64_t get_rand_64_range(uint64_t max);

uint64_t get_rand_no111(void) {
    uint64_t idx = get_rand_64_range(TRIBO[64]);
    uint64_t ret = 0;
    for (int i = 63; i >= 0; i--) {
        if (idx >= TRIBO[i]) {
            ret |= ((uint64_t) 1) << i;
            idx -= TRIBO[i];
        }
        // optional: if (idx == 0) {break;}
    }
    return ret;
}

(Warning: retyped from Python code. I suggest testing.)

This satisfies the 'unbiased' portion, and is indeed faster than the naive rejection-sampling approach, but unfortunately is still pretty slow, because it's looping ~64 times.