Home > Back-end >  Generate random integers with two constraints (sum and local maximum) in Python
Generate random integers with two constraints (sum and local maximum) in Python

Time:09-22

I have a dataframe where I want to create random numbers in a new column. The random numbers must fulfill two constraints:

  1. The random numbers must add up to a specified sum (in the example, the sum is 300)
  2. For each observation, the random numbers must not exceed a value in the constraint column.

In the example below, the constraints are fulfilled because the sum is 300 and the random number does not exceed the constraint column.

Example:

GEOID CONSTRAINT RANDOM
010010000001 100 80
010010000002 50 40
010010000003 75 60
010010000004 75 60
010010000005 100 60

It seems having random numbers totaling a sum has been demonstrated but I do not see an example with a second constraint.

Edit for clarity: The new column must be integers. The minimum lower bound value is 0.

CodePudding user response:

You could use the multinomial distribution to build an approximate answer:

def sample(total, constraints):
    import numpy as np
    rng = np.random.default_rng()
    samples = rng.multinomial(total, constraints / constraints.sum(), size=100)
    return next(val for val in samples if np.all(val < constraints))


df["RANDOM"] = sample(300, df["CONSTRAINT"].values)
print(df)

Output

             GEOID  CONSTRAINT  RANDOM
0  10010000001         100      81
1  10010000002          50      42
2  10010000003          75      57
3  10010000004          75      53
4  10010000005         100      67

Thanks goes to @Michael Szczesny for testing the solution.

The key to solve this, relies in (quote from numpy docs):

Its values, X_i = [X_0, X_1, ..., X_p], represent the number of times the outcome was i.

see more details in this blog post.

CodePudding user response:

Here is my solution, let me know what you think about it:

total_sum = 300  # total we want to obtain, is dynamical

# We check if the problem is realistic
if sum(df["CONSTRAINT"]) < total_sum:
    raise Exception("IMPOSSIBLE TO REACH")

random_value_list = []  # values for the new column RANDOM

# We shuffle just to make sure the function works well in any case, (can keep the original order with older indexes)
df = df.sample(frac=1)
df = df.reset_index(drop=False)

for i in range(df.shape[0] - 1):

    total_reached = sum(random_value_list)
    total_yet_to_reach = total_sum - total_reached
    max_value = min(df.at[i, "CONSTRAINT"], total_yet_to_reach)
    # we put constraints or max value to not go above 300 as conditions

    sum_next_constraints = sum(df.loc[i   1:, :]["CONSTRAINT"])
    
    random_value = random.randint(0, max_value)
    # this while condition is a conditions for the minimum of the random.randint
    while random_value   sum_next_constraints   total_reached < total_sum:
        random_value = random.randint(0, max_value)
    random_value_list.append(random_value)

# complete the list to get 300 (can be 0)
random_value_list.append(total_sum - sum(random_value_list))

df["RANDOM"] = random_value_list
df

Here is just one of the example of what you can get :

   index         GEOID  CONSTRAINT  RANDOM
0      3  010010000004          75      49
1      0  010010000001         100      94
2      1  010010000002          50      10
3      2  010010000003          75      69
4      4  010010000005         100      78

We can check that:

sum(df["RANDOM"])
Out : 300
  • Related