Pandas groupby apply a random day to each group of years-CodePudding

I am trying to generate a different random day within each year group of a dataframe. So I need replacement = False, otherwise it will fail.

You can't just add a column of random numbers because I'm going to have more than 365 years in my list of years and once you hit 365 it can't create any more random samples without replacement.

I have explored agg, aggreagte, apply and transform. The closest I have got is with this:

    years = pd.DataFrame({"year": [1,1,2,2,2,3,3,4,4,4,4]})
    years["day"] = 0
    grouped = years.groupby("year")["day"]
    grouped.transform(lambda x: np.random.choice(366, replace=False))

Which gives this:

0       8
1       8
2     319
3     319
4     319
5     149
6     149
7     130
8     130
9     130
10    130
Name: day, dtype: int64

But I want this:

0       8
1      16
2     119
3     321
4     333
5       4
6      99
7      30
8     129
9     224
10    355
Name: day, dtype: int64

CodePudding user response：

You can use your code with a minor modification. You have to specify the number of samples.

random_days = lambda x: np.random.choice(range(1, 366), len(x), replace=False)
years['day'] = years.groupby('year').transform(random_days)

Output:

>>> years
    year  day
0      1   18
1      1  300
2      2  154
3      2  355
4      2  311
5      3   18
6      3   14
7      4  160
8      4  304
9      4   67
10     4    6

CodePudding user response：

With numpy broadcasting :

years["day"] = np.random.choice(366, years.shape[0], False) % 366

years["day"] = years.groupby("year").transform(lambda x: np.random.permutation(x))

Output :

print(years)

    year  day
0      1  233
1      1  147
2      2    1
3      2  340
4      2  267
5      3  204
6      3  256
7      4  354
8      4   94
9      4  196
10     4  164