What is the most efficient way to generate a list of random numbers all within a range that have a f-CodePudding

I am trying to generate a list of 12 random weights for a stock portfolio in order to determine how the portfolio would have performed in the past given different weights assigned to each stock. The sum of the weights must of course be 1 and there is an additional restriction: each stock must have a weight between 1/24 and 1/4.

Although I am able to generate random numbers such that they all fall within the interval by using random.uniform(), as well as guarantee their sum is 1 by dividing each weighting by the sum of the weightings, I'm finding that

a) each subsequent array of weightings is very similar. I am rarely getting values for weightings that are near the upper boundary of 1/4

b) random.seed() does not seem to be working properly, whether I put it in the randweight() function or at the beginning of the for loop. I'm confused as to why because I thought that generating a random seed value would make my array of weights unique for each iteration. Currently, it's cyclical, with a period of 3.

The following is my code:

# boundaries on weightings
n = 12
min_weight = (1/(2*n))
max_weight = 25 / 100

def rand_weight(e):
    random.seed()
    return e   np.random.uniform(min_weight, max_weight)

for i in range(100):
    weights = np.empty(12)
    while not (np.all(weights > min_weight) and np.all(weights < max_weight)):
        weights = np.array(list(map(rand_weight, weights)))
        weights /= np.sum(weights)

I have already tried scattering the weights by changing the min_weight and max_weight inside the for loop so that rand_weight generates newer values, but this makes the runtime really slow because the "not" condition in the while loop takes longer to evaluate to false (since the probability of all the numbers being in the range decreases).

CodePudding user response：

The following works. Particularly confusing to me is that np.empty(12) seemed to always return the same array. So once it had been initialized, it stayed the same.

This seems to produce numbers above 0.22 reasonably often.

import numpy as np
from random import random, seed

# boundaries on weightings
n = 12
min_weight = (1/(2*n))
max_weight = 25 / 100

seed(666)
for i in range(100):
    weights = np.zeros(n)
    while not (np.all(weights > min_weight) and np.all(weights < max_weight)):
        weights = np.array([random() for _ in range(n)])
        weights /= np.sum(weights) - min_weight * n
        weights  = min_weight
    print(weights)

CodePudding user response：

Lets start with simple facts first. If you want numbers to be in the range [0.042...0.25] and 12 iid numbers in total summed to one, then for mean value

Sum(X_i)=1

E[Sum(X_i)]=Sum(E[X_i])=N E[X_i] = 1

E[X_i]=1/N = 1/12 = 0.083

One corollary is that it would be hard to get numbers close to upper range boundary.

And instead doing things like sampling whatever and then normalizing to get sum to 1, better to use known distribution where sum of values is 1 to begin with.

So lets use Dirichlet distribution, and sample points uniformly in simplex, which means alpha (concentration) vector is all ones.

import numpy as np

N = 12
s = np.random.dirichlet(N*[1.0], 1)
print(np.sum(s))

Some value would be larger (or smaller), and you could reject them

def sampleWeights(alpha, lo, hi):
    while True:
        s = np.random.dirichlet(alpha, 1)[0]
        if np.any(s > hi):
            continue # reject
        if np.any(s < lo):
            continue # reject
        return s # accept

and call it like this

N=12
alpha = N*[1.0]
q = sampleWeights(alpha, 1./24., 1./4.)

if you could check it, a lot of rejections happens at low bound, rather then high bound.

BEauty of using known Dirichlet distribution is that you could "concentrate" sampled values around mean, e.g.

alpha = N*[10.0]
q = sampleWeights(alpha, 1./24., 1./4.)

will produce same iid with mean of 1/12 but a lot smaller std.deviation, RV a lot more concentrated around mean

And if you want non-identically distributed RVs, use different alphas

alpha = [1.,2.,3.,4.,5.,6.,6.,5.,4.,3.,2.,1.]
q = sampleWeights(alpha, 1./24., 1./4.)

then some of RVs would be close to upper boundary, and some close to lower boundary. Lots of advantages to use known distribution