I recently read an article explaining how to generate a weighted random number, and there's a piece of the code that I don't understand:
int r = ((int)(rand.Next() * (323567)) % prefix[n - 1]) 1;
Why is rand.Next being multiplied by a constant 323567? Would the code work without this constant? Below is the full code for reference, and you can find the full article here: https://www.geeksforgeeks.org/random-number-generator-in-arbitrary-probability-distribution-fashion/
Any help is appreciated, thank you!!
// C# program to generate random numbers
// according to given frequency distribution
using System;
class GFG{
// Utility function to find ceiling
// of r in arr[l..h]
static int findCeil(int[] arr, int r,
int l, int h)
{
int mid;
while (l < h)
{
// Same as mid = (l h)/2
mid = l ((h - l) >> 1);
if (r > arr[mid])
l = mid 1;
else
h = mid;
}
return (arr[l] >= r) ? l : -1;
}
// The main function that returns a random number
// from arr[] according to distribution array
// defined by freq[]. n is size of arrays.
static int myRand(int[] arr, int[] freq, int n)
{
// Create and fill prefix array
int[] prefix = new int[n];
int i;
prefix[0] = freq[0];
for(i = 1; i < n; i)
prefix[i] = prefix[i - 1] freq[i];
// prefix[n-1] is sum of all frequencies.
// Generate a random number with
// value from 1 to this sum
Random rand = new Random();
int r = ((int)(rand.Next() * (323567)) % prefix[n - 1]) 1; // <--- RNG * Constant
// Find index of ceiling of r in prefix array
int indexc = findCeil(prefix, r, 0, n - 1);
return arr[indexc];
}
// Driver Code
static void Main()
{
int[] arr = { 1, 2, 3, 4 };
int[] freq = { 10, 5, 20, 100 };
int i, n = arr.Length;
// Let us generate 10 random numbers
// according to given distribution
for(i = 0; i < 5; i )
Console.WriteLine(myRand(arr, freq, n));
}
}
UPDATE: I ran this code to check it:
int[] intArray = new int[] { 1, 2, 3, 4, 5 };
int[] weights = new int[] { 5, 20, 20, 40, 15 };
List<int> results = new List<int>();
for (int i = 0; i < 100000; i )
{
results.Add(WeightedRNG.GetRand(intArray, weights, intArray.Length));
}
for (int i = 0; i < intArray.Length; i )
{
int itemsFound = results.Where(x => x == intArray[i]).Count();
Console.WriteLine($"{intArray[i]} was returned {itemsFound} times.");
}
And here are the results with the constant:
1 was returned 5096 times.
2 was returned 19902 times.
3 was returned 20086 times.
4 was returned 40012 times.
5 was returned 14904 times.
And without the constant...
1 was returned 100000 times.
2 was returned 0 times.
3 was returned 0 times.
4 was returned 0 times.
5 was returned 0 times.
It completely breaks without it.
CodePudding user response:
The constant does serve a purpose in some environments, but I don't believe this code is correct for C#.
To explain, let's look at the arguments to the function. The first sign something is off is passing n
as an argument instead of inferring it from the arrays. The second sign is it's poor practice in C# to deal with paired arrays rather than something like a 2D array or sequence of single objects (such as a Tuple). But those are just indicators something is odd, and not evidence of any bugs.
So let's put that on hold for a moment and explain why a constant might matter by looking a small example.
Say you have three numbers (1, 2, and 3) with weights 3, 2, and 2. This function first builds up a prefix
array, where each item includes the chances of finding the number for that index and all previous numbers.
We end up with a result like this: (3, 5, 7)
. Now we can use the last value and take a random number from 1 to 7. Values 1-3 result in 1
, values 4 and 5 result in 2
, and values 6 and 7 result in 3
.
To find this random number the code now calls rand.Next()
, and this is where I think the error comes in. In many platforms, the Next()
function returns a double between 0 and 1. That's too small to use to lookup your weighted value, so you then multiply by a prime constant related the platform's epsilon value to ensure you have a reasonably large result that will cover the entire possible range of desired weights (in this case, 1-7) and then some. Now you take the remainder (%
) vs your max weight (7
), and map it via the prefix array to get the final result.
So the first error is, in C#, .Next()
does not return a double. It is documented to return a non-negative random integer between 0
and int.MaxValue
. Multiply that by 323567
and you're asking for integer overflow exceptions. Another sign of this mistake is the cast to int
: the result of this function is already an int. And let's not even talk the meaningless extra parentheses around (323567)
.
There is also another, more subtle, error.
Let's the say the result of the (int)(rand.Next() * 323567)
expression is 10
. Now we take this value and get the remainder when dividing by our max value (%7
). The problem here is we have two chances to roll a 1, 2, or 3 (the extra chance is if the original was 8, 9, or 10), and only once chance for the remaining weights (4-7). So we have introduced unintended bias into the system, completely throwing off the expected weights. (This is a simplification. The actual number space is not 1-10; it's 0 * 323567 - 0.999999 * 323567. But the potential for bias still exists as long that max value is not evenly divisible by the max weight value).
It's possible the constant was chosen because it has properties to minimize this effect, but again: it was based on a completely different kind of .Next()
function.
Therefore the rand.Next()
line should probably be changed to look like this:
int r = rand.Next(prefix[n - 1]) 1;
or this:
int r = ((int)(rand.NextDouble() * (323567 * prefix[n - 1])) % prefix[n - 1]) 1;
But, given the other errors, I'd be wary of using this code at all.