I have this set of data :
df = pd.DataFrame()
df['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021"]
df['Time'] = ["06:48:00", "06:53:00", "06:56:00", "06:59:00", "07:14:00", "07:18:00", "07:40:00", "08:12:00", "08:42:00", "08:57:00", "05:45:00", "05:55:00", "06:05:00"]
df["Column1"] = [0.011534891, 0.013458399, 0.017792937, 0.018807581, 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 0.000543031, 0.000342, 0.000313769]
I want to roll over 1 day and find the mean value of "Column1".
I tried this code :
df.index = df["Date"]
df['mean'] = df["Column1"].rolling(1440, min_periods=1).mean()
print(df['mean'])
output:
Date
29/07/2021 0.011535
29/07/2021 0.012497
29/07/2021 0.014262
29/07/2021 0.015398
30/07/2021 0.017505
30/07/2021 0.018781
30/07/2021 0.019893
30/07/2021 0.020874
31/07/2021 0.021761
31/07/2021 0.019623
01/08/2021 0.017889
01/08/2021 0.016426
02/08/2021 0.015187
As you can see, the dates are repeating and that is not correct.
I expected to have something like this (i want the mean value of those values i just used mean to show you what i needed but it will be just one value as a mean one):
Date
29/07/2021 mean(0.011535, 0.012497, 0.014262, 0.015398)
30/07/2021 mean(0.017505, 0.018781, 0.019893, 0.020874)
31/07/2021 mean(0.021761, 0.019623)
01/08/2021 mean(0.017889, 0.016426)
02/08/2021 0.015187
I also tried this code :
df['DateTime'] = pd.to_datetime(df['Date'], dayfirst=True) pd.to_timedelta(df['Time'])
df.index = df['DateTime']
df['mean'] = df["Column1"].rolling('1D').mean()
print(df['mean'])
output :
DateTime
2021-07-29 06:48:00 0.011535
2021-07-29 06:53:00 0.012497
2021-07-29 06:56:00 0.014262
2021-07-29 06:59:00 0.015398
2021-07-30 07:14:00 0.025931
2021-07-30 07:18:00 0.025547
2021-07-30 07:40:00 0.025885
2021-07-30 08:12:00 0.026350
2021-07-31 08:42:00 0.028854
2021-07-31 08:57:00 0.014619
2021-08-01 05:45:00 0.009927
2021-08-01 05:55:00 0.007531
2021-08-02 06:05:00 0.000314
But I get a different result and still not the one desired.
CodePudding user response:
Rather than rolling
, the desired output calls for groupby
mean
:
out = df.groupby('Date', as_index=False)['Column1'].mean()
Output:
Date Column1
0 01/08/2021 0.000443
1 02/08/2021 0.000314
2 29/07/2021 0.015398
3 30/07/2021 0.026350
4 31/07/2021 0.014619