how to generate random numbers that can be summed to a specific value?-CodePudding

I have 2 dataframe as follows:

import pandas as pd
import numpy as np
# Create data set.
dataSet1 = {'id': ['A', 'B', 'C'],
           'value' : [9,20,20]}
dataSet2 = {'id' : ['A', 'A','A','B','B','B','C'],
            'id_2': [1, 2, 3, 2,3,4,1]}
# Create dataframe with data set and named columns.
df_map1 = pd.DataFrame(dataSet1, columns= ['id', 'value'])

df_map2 = pd.DataFrame(dataSet2, columns= ['id','id_2'])

df_map1

    id  value
0   A   9
1   B   20
2   C   20

df_map2

where id_2 can have dups of id. (namely id_2 is subset of id)

#doing a quick merge, based on id.
df = df_map1.merge(df_map2 ,on=['id'])

    id  value   id_2
0   A   9         1
1   A   9         2
2   A   9         3
3   B   20        2
4   B   20        3
5   B   20        4
6   C   20        1

I can represent what's the relationship between id and id_2 as follows

id_ref = df.groupby('id')['id_2'].apply(list).to_dict()
{'A': [1, 2, 3], 'B': [2, 3, 4], 'C': [1]}

Now, I would like to generate random integer say 0 to 3 put the list (5 elements for exmaple) into the pandas df and explode.

import numpy as np
import random
df['random_value'] = df.apply(lambda _: np.random.randint(0,3, 5), axis=1)

    id  value   id_2        random_value
0   A   9        1        [0, 0, 0, 0, 1]
1   A   9        2        [0, 2, 1, 2, 1]
2   A   9        3        [0, 1, 2, 2, 1]
3   B   20       2        [2, 1, 1, 2, 2]
4   B   20       3        [0, 0, 0, 0, 0]
5   B   20       4        [1, 0, 0, 1, 0]
6   C   20       1        [1, 2, 2, 2, 1]

The condition for generating this random_value list, is that sum of the list has to be equal to 9.

That means, for id : A, if we sum all the elements inside the list, we have total of 13 shown the description below, but what we want is 9:

and same concept for id B and C.. and so on....

is there anyway to achieve this?

# i was looking into multinomial from np.random function... seems this should be the solution but im not sure how to apply this with pandas.

np.random.multinomial(9, np.ones(5)/5, size = 1)[0]

=> array([2,3,3,0,1])

2 3 3 0 1 = 9

ATTEMPT/IDEA ...

given that we have list of id_2. ie) id: A has 3 distinct elements [1,2,3].

so id A is mapped to 3 different elements. so we can get

3 * 5 = 15 ( which will be our long list )

3: length of list

5: create 5 elements of list

hence

list_A = np.random.multinomial(9,np.ones(3*5)/(3*5) ,size = 1)[0]

and then we evenly distribute/split the list. using this list comprehension:

[list_A [i:i   n] for i in range(0, len(list_A ), n)]

but I am still unsure how to do this dynamically.

CodePudding user response：

The core idea is as you said (about getting 3*5=15 numbers), plus reshaping it into a 2D array with the same number of rows as that id has in the dataframe. The following function does that,

def generate_random_numbers(df):
    value = df['value'].iloc[0]

    list_len = 5
    num_rows = len(df)
    num_rand = list_len*num_rows
    
    return pd.Series(
        map(list, np.random.multinomial(value, np.ones(num_rand)/num_rand).reshape(num_rows, -1)),
        df.index
    )

And apply it:

df['random_value'] = df.groupby(['id', 'value'], as_index=False).apply(generate_random_numbers).droplevel(0)