Is there a performant way to generate an unbiased 64b random integer without 3 set bits in a row, assuming a fast-and-unbiased input PRNG? I don't care about 'wasting bits' of the input source.
That is, something better than the naive rejection-sampling approach:
uint64_t r;
do {
r = get_rand_64();
} while (r & (r >> 1) & (r >> 2));
...which "works", but is very slow. It looks like it's iterating ~187x on average or so.
One possibility I've explored is roughly:
bool p2 = get_rand_bit();
bool p1 = get_rand_bit();
uint64_t r = (p1 << 1) | p2;
for (int i = 2; i < 64; i ) {
bool p0 = (p1 && p2) ? false : get_rand_bit();
r |= p0 << i;
p2 = p1;
p1 = p0;
}
...however, this is still slow. Mainly because using this approach the entire calculation is bit-serial. EDIT: and it's also biased. Easiest to see with a 3-bit integer - 0b011 occurs 1/8th of the time, which is wrong (should be 1/7th).
I've tried doing various parallel fixups, but haven't been able to come up with anything unbiased. It's useful to play around with 4-bit integers first - e.g. setting all bits involved in a conflict to random values ends up biased, and drawing out the Markov chain for 4 bits makes that obvious
Is there a better way to do this?
CodePudding user response:
The idea behind the code below is to generate the upper 32 bits with the proper (non-uniform!) distribution, then generate the lower 32 conditional on the upper. On my laptop, it’s significantly faster than the baseline.
You can see the logic behind the non-uniform upper distribution with 4-bit outputs: 00 and 10 have four 2-bit lowers, 01 has three lowers, and 11 has two lowers.
#include <cstdint>
#include <random>
namespace {
template <typename T, typename URBG> T GenerateBaseline(URBG &gen) {
T r;
do {
r = std::uniform_int_distribution<T>{std::numeric_limits<T>::min(),
std::numeric_limits<T>::max()}(gen);
} while (r & (r >> 1) & (r >> 2));
return r;
}
constexpr std::uint64_t Tribonacci(int n) {
std::uint64_t a = 1;
std::uint64_t b = 0;
std::uint64_t c = 0;
for (int i = 0; i < n; i) {
std::uint64_t sum = a b c;
c = b;
b = a;
a = sum;
}
return a;
}
template <typename URBG> std::uint32_t GenerateUpper(URBG &gen) {
for (;;) {
auto upper = GenerateBaseline<std::uint32_t>(gen);
switch (upper & 3) {
case 0:
case 2:
return upper;
case 1:
if (std::uniform_int_distribution<std::uint32_t>{1, Tribonacci(32)}(
gen) <= Tribonacci(32 - 1) Tribonacci(32 - 2)) {
return upper;
}
break;
case 3:
if (std::uniform_int_distribution<std::uint32_t>{1, Tribonacci(32)}(
gen) <= Tribonacci(32 - 1)) {
return upper;
}
break;
}
}
}
template <typename URBG> std::uint64_t Generate(URBG &gen) {
auto upper = std::uint64_t{GenerateUpper(gen)} << 32;
std::uint64_t r;
do {
r = upper std::uniform_int_distribution<std::uint32_t>{
std::numeric_limits<std::uint32_t>::min(),
std::numeric_limits<std::uint32_t>::max()}(gen);
} while (r & (r >> 1) & (r >> 2));
return r;
}
} // namespace
int main() {
std::mt19937 gen{std::random_device{}()};
for (std::int32_t i = 0; i < 100000; i ) {
if (false) {
GenerateBaseline<std::uint64_t>(gen);
} else {
Generate(gen);
}
}
}
CodePudding user response:
From @John Coleman's comment, here's the start of an approach based on Tribonacci numbers. Basic idea:
- Generate an unbiased number in the range
[0..T(bits))
, whereT(0) = 1, T(1) = 2, T(2) = 4, T(n) = T(n-1) T(n-2) T(n-3)
. - Convert to Tribonacci representation.
- You're done.
A minimal example is as follows:
// 1, 2, 4, TRIBO[n-3] TRIBO[n-2] TRIBO[n-1]
// possible minor perf optimization: reverse TRIBO
static const uint64_t TRIBO[65] = {1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927, 1705, 3136, 5768, 10609, 19513, 35890, 66012, 121415, 223317, 410744, 755476, 1389537, 2555757, 4700770, 8646064, 15902591, 29249425, 53798080, 98950096, 181997601, 334745777, 615693474, 1132436852, 2082876103, 3831006429, 7046319384, 12960201916, 23837527729, 43844049029, 80641778674, 148323355432, 272809183135, 501774317241, 922906855808, 1697490356184, 3122171529233, 5742568741225, 10562230626642, 19426970897100, 35731770264967, 65720971788709, 120879712950776, 222332455004452, 408933139743937, 752145307699165, 1383410902447554, 2544489349890656, 4680045560037375, 8607945812375585, 15832480722303616, 29120472094716576, 53560898629395777, 98513851446415969];
// exclusive of max
extern uint64_t get_rand_64_range(uint64_t max);
uint64_t get_rand_no111(void) {
uint64_t idx = get_rand_64_range(TRIBO[64]);
uint64_t ret = 0;
for (int i = 63; i >= 0; i--) {
if (idx >= TRIBO[i]) {
ret |= ((uint64_t) 1) << i;
idx -= TRIBO[i];
}
// optional: if (idx == 0) {break;}
}
return ret;
}
(Warning: retyped from Python code. I suggest testing.)
This satisfies the 'unbiased' portion, and is indeed faster than the naive rejection-sampling approach, but unfortunately is still pretty slow, because it's looping ~64 times.