How to select value from different ranges with equal probability-CodePudding

Provided different ranges, select each value with equal probability. Like say var 'a' can have values among { [10,20],40,[70,100]...} (given) . Each selected value by provided constraints should have same probability. How to get a random value in C?

CodePudding user response：

Giving each Range equal probabilistic chance:

Let N be the number of ranges you've defined in your problem-set. Ranges { R₀, R₁, R₂ ... R_N-1 }, Indexes start at 0.
Generate a random number, RandValue mod N to pick a range. In C, modulo operator is %, gives you integral remainder.
Is picked range just a number? (like 40 in your example)
- 3.1 Yes, then your random value is that number
- 3.2 No, it's a range. Find a random value within selected range.

Giving each value in all ranges equal probabilistic chance:

Let N be the number of values across all ranges.
Map each value to an index, Values { V₀, V₁, V₂ ... V_N-1 }, Indexes start at 0.
Use hash-tables for quick lookups. Also, you can handle overlapping ranges.
Generate a random number, RandValue mod N to pick a value-index.
Look up in hash-table for mapped value against the index.

Also, note that hash-table could become huge if the ranges are too large. In that case you may have to merge overlapping/consecutive (if any) ranges and maintain sorted(by value-index) list(array of structs) of ranges and assign index-ranges. Use binary search to find the range against random-index. Range offsets (start/end values & indexes) should give the final value for a given random-index.

PS: This is for trivial implementations of randomness in C projects. I believe all randomness is deterministic.

Edit: I agree, there is modulo-bias & to reject values beyond (RAND_MAX - RAND_MAX % N).

CodePudding user response：

Simple solution:

do
   r=rand();
until (is_in_range(r));

It's not at all efficient, and especially it's not bounded in running time. But it should work.

And sometimes simple and stupid solutions are good enough.

(Once you start doing things like r=rand()%limit;, then you start introducing skewed probabilities. Imagine doing r=rand()%((RAND_MAX/2) 1);. It's twice as likely to return anything below RAND_MAX/2 as RAND_MAX/2. See this answer for more detail. )

To improve performance, one could do something like what @Jakob Stark hinted at:

for(limit=1;limit<top_of_range;limit<<=1)
       ;  // Find the smallest power-of-two larger than the top_of_range
do
      r=rand()%limit;
while(!(is_in_range(r));

It's still not guaranteed to run in finite time, though...