I am trying to create a random set of 25 numbers, which are between 2 and 25, and sum up to 100 in python.
This Question gives an answer, but it seems that the maximum number never ends up being close to 25.
I've tried creating a list, dividing each number, and recreating the list, but it essentially nullifies my min and max values since they end up getting divided by a number larger than 1 almost all of the time:
numbers = np.random.randint(low = 2, high = 25, size = 100, dtype = int)
scale = 100 / sum(numbers) #We want weights to add up to 100%
#Scale values
for value in numbers:
nums.append(value * scale)
Is there any way to do this? Thanks
CodePudding user response:
You haven't specified what probability distribution the numbers should have so this could be an easy valid way although very unlikely to yield numbers close to 25:
import numpy as np
numbers = np.full(25,2)
while numbers.sum() < 100:
i = np.random.randint(25)
if numbers[i] < 25: # almost guaranteed...
numbers[i] = 1
CodePudding user response:
Suppose you want a random list of 25 numbers (not necessarily integers) which add up to 100, with the constraint that each number is at least 2 and no more than 25.
First, note that we can change that into an equivalent problem where the numbers are only required to be non-negative, by generating 25 numbers between 0 and 23 which add up to 100-25*2, which is 50. Once we have that list, we just add two to each number; the new list will be between 2 and 25, and its sum will be 100 (because we added 2 to each of 25 numbers).
The second thing to note is that the probability of finding a number close to 25 in that list is pretty small, since that would require one number to have attracted almost half of the available total. (That claim is clearer if you look at the alternative formulation, 25 numbers between 0 and 23 which add up to 50. If one of those numbers is, say, 20, then the other 24 numbers add up to 30, which means that you have a distribution which looks more like the distribution of wealth in an unregulated market than a uniform distribution.)
Since we're going to generate a uniform sample, we can handle the maximum value by ignoring it until we generate the random list, and then checking to see if the incredibly unlikely biased sample showed up; if it did, we just toss out the result and try again. (That's called "rejection sampling", and it's a pretty common technique, which works adequately even if half the samples are rejected. The advantage of rejection sampling is that it does not introduce bias.)
So let's get back to the question of how to generate a uniformly distributed list of non-negative numbers with a given sum. As long as the numbers are from a very large universe of possible values (like all double-precision floating point numbers within the range), that's quite easy. Say we need N numbers which add up to k. We start by randomly generating N-1 numbers, each in the range (0, k). Then we sort that set of numbers, and put 0 at one end of the sorted list and k at the other end. Finally, we compute the adjacent differences between successive elements. That gives us a list of N numbers which add up to k, and it turns out that the random lists so generated are an almost uniform sample of the possibilities. (There is a tiny bias introduced by the fact that it is possible for the same random number to have been generated twice, leading to a zero in the final list of differences. The zero is not a problem; the problem is that the zero shows up slightly too often. But the probability of getting an exact zero is less than one in a hundred million, and it's only exact zeros whose frequency is biased.)
In summary:
from random import uniform
def gen_random_list(N=25, k=100, min=2, max=25):
assert(N * min <= k <= N * max)
adjusted_k = k - min * N
while True:
endpoints = sorted(uniform(0, adjusted_k) for i in range(N - 1))
values = [*(end - begin min
for begin, end in zip([0] endpoints,
endpoints [adjusted_k]))]
if all(v <= max for v in values):
return values
OK, what if we need a list of integers? In that case, it's much more likely that the above procedure will produce a zero, and the bias will be noticeable. To avoid that, we make two changes:
- Instead of adjusting the range so that the minimum is 0, we adjust it so that the minimum is 1. (Which works for integers, because there are no integers between 0 and 1.) Now, the adjusted sum will be k' = k - N * (min - 1).
- Second, instead of generating N - 1 independent random values, we generate a random selection of N - 1 different values from the half-open range [1, k') (using
random.sample
) Other than that, the algorithm is the same. Sort the generated list, compute adjacent differences, and verify that the maximum wasn't exceeded:
from random import sample
def gen_random_list_of_ints(N=25, k=100, min=2, max=25):
assert(N * min <= k <= N * max)
adjusted_k = k - (min - 1) * N
while True:
endpoints = sorted(sample(range(1, adjusted_k), N - 1))
values = [*(end - begin min - 1
for begin, end in zip([0] endpoints,
endpoints [adjusted_k]))]
if all(v <= max for v in values):
return values
CodePudding user response:
Simplest way to do that for integers is to use Multinomial distribution, which has nice property to sum up to desired number. First, take out minimum value to get range [0...s], and then just use multinomial and reject sample exceeding max value. You could play with probabilities array p to get desired behavior.
As already noted, mean value would be 4.
Code, Python 3.10, Windows x64
import numpy as np
N = 25
minv = 2
maxv = 25
summ = 100
def sampling(N, minv, maxv, summ):
summa = summ - N*minv # fix range to [0...]
p = np.full(N, 1.0/N) # probabilities, uniform
while True:
t = np.random.multinomial(summa, p, size=1) minv # back to desired range
if np.any(t > maxv): # check
continue # and reject
return t
q = sampling(N, minv, maxv, summ)
print(np.sum(q))