Generate random numbers list with limit on each element and on total-CodePudding

Assume I have a list of values, for example:

limits = [10, 6, 3, 5, 1]

For every item in limits, I need to generate a random number less than or equal to the item. However, the catch is that the sum of elements in the new random list must be equal to a specified total.

For example if total = 10, then one possible random list is:

random_list = [2, 1, 3, 4, 0]

where you see random_list has same length as limits, every element in random_list is less than or equal to the corresponding element in limits, and sum(random_list) = total.

How to generate such a list? I am open (and prefer) to use numpy, scipy, or pandas.

CodePudding user response：

To generate such a list, you can use numpy's random.multinomial function. This function allows you to generate a list of random numbers that sum to a specified total, where each number is chosen from a different bin with a specified size.

For example, to generate a list of 5 random numbers that sum to 10, where the first number can be any integer from 0 to 10, the second number can be any integer from 0 to 6, and so on, you can use the following code:

import numpy as np

limits = [10, 6, 3, 5, 1]
total = 10

random_list = np.random.multinomial(total, [1/x for x in limits])

This will generate a list of 5 random numbers that sum to 10 and are less than or equal to the corresponding element in the limits list.

Alternatively, you could use numpy's random.randint function to generate random numbers that are less than or equal to the corresponding element in the limits list, and then use a loop to add up the numbers until the sum equals the specified total. This approach would look something like this:

import numpy as np

limits = [10, 6, 3, 5, 1]
total = 10

random_list = []

# Generate a random number for each element in limits
for limit in limits:
    random_list.append(np.random.randint(limit))

# Keep adding random numbers until the sum equals the total
while sum(random_list) != total:
    random_list[np.random.randint(len(random_list))]  = 1

Both of these approaches should work to generate a list of random numbers that sum to a specified total and are less than or equal to the corresponding element in the limits list.

EDIT FOR @gerges

To generate a list of random numbers that sum to a specified total and are less than or equal to the corresponding element in the limits list, you can use a combination of the numpy functions random.multinomial and random.randint.

Here is an example of how you could do this:

import numpy as np

limits = [10, 6, 3, 5, 1]
total = 10

# Generate a list of random numbers that sum to the total using the multinomial function
random_list = np.random.multinomial(total, [1/x for x in limits])

# Use the randint function to ensure that each number is less than or equal to the corresponding limit
for i, limit in enumerate(limits):
    random_list[i] = np.random.randint(random_list[i], limit 1)

# Check that the sum of the numbers in the list equals the specified total and that each number is less than or equal to the corresponding limit
assert sum(random_list) == total
for i, number in enumerate(random_list):
    assert number <= limits[I]

This approach generates a list of random numbers using the multinomial function, and then uses the randint function to ensure that each number is less than or equal to the corresponding limit. This guarantees that the resulting list of numbers will sum to the specified total and will be less than or equal to the corresponding element in the limits list.

CodePudding user response：

Found what I was looking for: The hypergeometric distribution which is similar to the binomial, but without replacement.

The distribution available in numpy:

import numpy as np

gen = np.random.Generator(np.random.PCG64(seed))
random_list = gen.multivariate_hypergeometric(limits, total)

# array([4, 4, 1, 1, 0])

Also to make sure I didn't misunderstand the distribution did a sanity check with 10 million samples and check that the maximum is always within the limits

res = gen.multivariate_hypergeometric(limits, total, size=10000000) 

res.max(axis=0)

# array([10,  6,  3,  5,  1])

which is same as limits.