Create a list of N random numbers with max and min value and total sum-CodePudding

I'm trying to create a list (call it: weights) of N random numbers between 0.005 and 0.045 with a total sum equal to 1. N can be any integer between 22 and 200. So the following restrictions:

Number of numbers in weights = N
For every n in weights: 0.005 < n < 0.045
sum of all n's in weights = 1

The first restriction is easy I think. Also, I know how to fix both the second and the third restriction separate from each other. But I don't know how to combine them into one piece of code.

Second restriction: 0.005 < x < 0.045:

import numpy as np
import random


weights_step1 = np.random.randint(min=5, max = 45, size = N)

weights = []
for weight in weights_step1:
  weights.append(weight/1000)

Third restriction

Generating a list of random numbers, summing to 1

Does anyone know how to get both restrictions into one piece of code?

CodePudding user response：

This might do the trick. The strategy is to uniformly partition the available space then iterate over the partitions and for each iteration take a random bit from the current one and add it to a second one.

You might find that you need to use the decimal package to get a little more precision. You will likely also want to introduce guards to ensure that the constraints on the number of partitions and their minimum and maximum sizes are not inconsistent.

import random

partitions = 45
partition_size_min = 0.005
partition_size_max = 0.045

weights = [1.0 / partitions] * partitions
for index in range(len(weights)):
    partner_index = random.randint(0, partitions-1)

    available_to_give = weights[index] - partition_size_min
    available_to_recieve = partition_size_max - weights[partner_index]
    delta = random.uniform(0, min(available_to_give, available_to_recieve))

    weights[partner_index]  = delta
    weights[index] -= delta

print(f"Sum: {sum(weights)} Min:{min(weights)} Max: {max(weights)}")
print(weights)

That should give you a result like:

Sum: 1.0 Min:0.0057759435106850415 Max: 0.04428043727891049

[
    0.03408561328744241,
    0.010644787590344313,
    0.01400427089495221,
    0.01484912512559225,
    ...
    0.019186499047958494,
    0.02443794812733188,
    0.03475172101526412,
    0.020782296753987052
]

I'm sure a proper statistician would find a fault in this method, but it might get you close to what you want.

CodePudding user response：

You might want to use the Dirichlet Rescale algorithm (DRS), for which a Python implementation is available:

That gives you a proper statistical guarantee. Quoting the abstract of the paper:

the vectors are uniformly distributed over the valid region of the domain of all possible vectors, bounded by the constraints.

Trying for 50 numbers summing to 1:

$ pip install drs
...
Installing collected packages: drs
Successfully installed drs-2.0.0
$ 
$ python3
Python 3.9.9 (main, Nov 19 2021, 00:00:00) 
...
>>>
>>> from drs import drs
>>>
>>> n   = 50
>>> sum = 1.0
>>>
>>> v1 = drs(n, sum, n*[0.045], n*[0.005])
>>> sum(v1)
1.0000000000000004
>>> max(v1)
0.04278387127251347
>>> min(v1)
0.005035400173241331
>>> 
>>> v2 = drs(n, sum, n*[0.045], n*[0.005])
>>> sum(v2)
0.9999999999999994
>>> max(v2)
0.04445793844097045
>>> min(v2)
0.005294943276519565
>>>