Home > Blockchain >  Assigning random value for categories in pandas
Assigning random value for categories in pandas

Time:06-17

I have a df

Name        Week
Google      1
Google      1
Amazon      1
Tesla       1
Tesla       1
Google      2
Google      2
Tesla       2
Tesla       2
Uber        3
Uber        3

I am trying to create a new column value which would be a random integer between x an y for combinations of Name and Week like so:

Name        Week        Value
Google      1           100
Google      1           100
Amazon      1           150
Tesla       1           170
Tesla       1           170
Google      2           250
Google      2           250
Tesla       2           157
Tesla       2           157
Uber        3           500
Uber        3           500

Where the same value is assigned for the combination of Name and `Week.

I tried:

def random_group_int(df_):
    
    week = df_.week_no
    supplier = df_.sm_supp_name

    combinations = list(itertools.combinations(df.Week.unique(), df.Name.unique()))

    rand_values_dict_by_combination = {combination: np.random.randint(100,200) for combination in combinations}

    # return value by the combination on the line
    # don't know how to do that

And I feel like this is not the best approach. I also tried:

df_rand = df.groupby(['Name','Week']).count()
df_rand['Value'] = df_rand['Week'].apply(lambda x : np.random.randint(100,200))
df_rand.reset_index(inplace = True)
df.merge(df_rand[['Value', 'Name', 'Week']], left_on = ['Name', 'Week'], right_on = ['Name', 'Week'], how = 'left')

Which does work but again, I am not sure if that's the approach I should be using.

CodePudding user response:

You can use GroupBy.transform and generate a random value in the transform:

import random
x, y = 100, 200
df['Value'] = (df.groupby(['Name', 'Week'])['Name'] # the column doesn't matter
                 .transform(lambda _: random.randint(x, y))
               )

example output:

      Name  Week  Value
0   Google     1    153
1   Google     1    153
2   Amazon     1    196
3    Tesla     1    198
4    Tesla     1    198
5   Google     2    122
6   Google     2    122
7    Tesla     2    180
8    Tesla     2    180
9     Uber     3    106
10    Uber     3    106

CodePudding user response:

This should work for your needs

s = df.drop_duplicates()
s['random_int'] = np.random.randint(0,100,size=(len(s), 1))
df_merge = pd.merge(df, s, how = 'left')
df_merge
  • Related