Home > Software engineering >  Roll over 1 day when you have repeated dates
Roll over 1 day when you have repeated dates

Time:04-21

I have this set of data :

df = pd.DataFrame()
df['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021"]
df['Time'] = ["06:48:00", "06:53:00", "06:56:00", "06:59:00", "07:14:00", "07:18:00", "07:40:00", "08:12:00", "08:42:00", "08:57:00", "05:45:00", "05:55:00", "06:05:00"]
df["Column1"] = [0.011534891, 0.013458399,  0.017792937, 0.018807581, 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 0.000543031, 0.000342, 0.000313769]

I want to roll over 1 day and find the mean value of "Column1".

I tried this code :

df.index = df["Date"]
df['mean'] = df["Column1"].rolling(1440, min_periods=1).mean()
print(df['mean'])

output:
Date
29/07/2021    0.011535
29/07/2021    0.012497
29/07/2021    0.014262
29/07/2021    0.015398
30/07/2021    0.017505
30/07/2021    0.018781
30/07/2021    0.019893
30/07/2021    0.020874
31/07/2021    0.021761
31/07/2021    0.019623
01/08/2021    0.017889
01/08/2021    0.016426
02/08/2021    0.015187

As you can see, the dates are repeating and that is not correct.

I expected to have something like this (i want the mean value of those values i just used mean to show you what i needed but it will be just one value as a mean one):

Date
29/07/2021    mean(0.011535, 0.012497, 0.014262, 0.015398)
30/07/2021    mean(0.017505, 0.018781, 0.019893, 0.020874)
31/07/2021    mean(0.021761, 0.019623)
01/08/2021    mean(0.017889, 0.016426)
02/08/2021    0.015187

I also tried this code :

df['DateTime'] = pd.to_datetime(df['Date'], dayfirst=True)   pd.to_timedelta(df['Time'])
df.index = df['DateTime']
df['mean'] = df["Column1"].rolling('1D').mean()
print(df['mean'])

output :
DateTime
2021-07-29 06:48:00    0.011535
2021-07-29 06:53:00    0.012497
2021-07-29 06:56:00    0.014262
2021-07-29 06:59:00    0.015398
2021-07-30 07:14:00    0.025931
2021-07-30 07:18:00    0.025547
2021-07-30 07:40:00    0.025885
2021-07-30 08:12:00    0.026350
2021-07-31 08:42:00    0.028854
2021-07-31 08:57:00    0.014619
2021-08-01 05:45:00    0.009927
2021-08-01 05:55:00    0.007531
2021-08-02 06:05:00    0.000314

But I get a different result and still not the one desired.

CodePudding user response:

Rather than rolling, the desired output calls for groupby mean:

out = df.groupby('Date', as_index=False)['Column1'].mean()

Output:

         Date   Column1
0  01/08/2021  0.000443
1  02/08/2021  0.000314
2  29/07/2021  0.015398
3  30/07/2021  0.026350
4  31/07/2021  0.014619
  • Related