I have a df
Name Week
Google 1
Google 1
Amazon 1
Tesla 1
Tesla 1
Google 2
Google 2
Tesla 2
Tesla 2
Uber 3
Uber 3
I am trying to create a new column value
which would be a random integer between x
an y
for combinations of Name
and Week
like so:
Name Week Value
Google 1 100
Google 1 100
Amazon 1 150
Tesla 1 170
Tesla 1 170
Google 2 250
Google 2 250
Tesla 2 157
Tesla 2 157
Uber 3 500
Uber 3 500
Where the same value is assigned for the combination of Name
and `Week.
I tried:
def random_group_int(df_):
week = df_.week_no
supplier = df_.sm_supp_name
combinations = list(itertools.combinations(df.Week.unique(), df.Name.unique()))
rand_values_dict_by_combination = {combination: np.random.randint(100,200) for combination in combinations}
# return value by the combination on the line
# don't know how to do that
And I feel like this is not the best approach. I also tried:
df_rand = df.groupby(['Name','Week']).count()
df_rand['Value'] = df_rand['Week'].apply(lambda x : np.random.randint(100,200))
df_rand.reset_index(inplace = True)
df.merge(df_rand[['Value', 'Name', 'Week']], left_on = ['Name', 'Week'], right_on = ['Name', 'Week'], how = 'left')
Which does work but again, I am not sure if that's the approach I should be using.
CodePudding user response:
You can use GroupBy.transform
and generate a random value in the transform:
import random
x, y = 100, 200
df['Value'] = (df.groupby(['Name', 'Week'])['Name'] # the column doesn't matter
.transform(lambda _: random.randint(x, y))
)
example output:
Name Week Value
0 Google 1 153
1 Google 1 153
2 Amazon 1 196
3 Tesla 1 198
4 Tesla 1 198
5 Google 2 122
6 Google 2 122
7 Tesla 2 180
8 Tesla 2 180
9 Uber 3 106
10 Uber 3 106
CodePudding user response:
This should work for your needs
s = df.drop_duplicates()
s['random_int'] = np.random.randint(0,100,size=(len(s), 1))
df_merge = pd.merge(df, s, how = 'left')
df_merge