I have a dataframe with two columns, date and values.
import numpy as np
import pandas as pd
import datetime
from pandas import Timestamp
a = [[Timestamp('2014-06-17 00:00:00'), 0.023088847378082145],
[Timestamp('2014-06-18 00:00:00'), -0.02137513226556209],
[Timestamp('2014-06-19 00:00:00'), -0.023107608748262454],
[Timestamp('2014-06-20 00:00:00'), -0.005373831609931101],
[Timestamp('2014-06-23 00:00:00'), 0.0013989552359290336],
[Timestamp('2014-06-24 00:00:00'), 0.02109937927428618],
[Timestamp('2014-06-25 00:00:00'), -0.008350303722982733],
[Timestamp('2014-06-26 00:00:00'), -0.037202662556428456],
[Timestamp('2014-06-27 00:00:00'), 0.00019764611153205713],
[Timestamp('2014-06-30 00:00:00'), 0.003260577288983324],
[Timestamp('2014-07-01 00:00:00'), -0.0072877596184343085],
[Timestamp('2014-07-02 00:00:00'), 0.010168645518006336],
[Timestamp('2014-07-03 00:00:00'), -0.011539447143668391],
[Timestamp('2014-07-04 00:00:00'), 0.025285678867997374],
[Timestamp('2014-07-07 00:00:00'), -0.004602922207492033],
[Timestamp('2014-07-08 00:00:00'), -0.031298707413768834],
[Timestamp('2014-07-09 00:00:00'), 0.005929355847110296],
[Timestamp('2014-07-10 00:00:00'), -0.0037464360290646592],
[Timestamp('2014-07-11 00:00:00'), -0.030786217361942203],
[Timestamp('2014-07-14 00:00:00'), -0.004914625647469917],
[Timestamp('2014-07-15 00:00:00'), 0.010865602291856957],
[Timestamp('2014-07-16 00:00:00'), 0.018000430446729165],
[Timestamp('2014-07-17 00:00:00'), -0.007274924758687407],
[Timestamp('2014-07-18 00:00:00'), -0.005852455583728933],
[Timestamp('2014-07-21 00:00:00'), 0.021397540863909104],
[Timestamp('2014-07-22 00:00:00'), 0.03337842963821558],
[Timestamp('2014-07-23 00:00:00'), 0.0022309307682939483],
[Timestamp('2014-07-24 00:00:00'), 0.007548983718178803],
[Timestamp('2014-07-25 00:00:00'), -0.018442920569716525],
[Timestamp('2014-07-28 00:00:00'), -0.015902529445214975]]
df = pd.DataFrame(a, columns=['dates', 'Values'])
I want to calculate the average of the column Values aggregating each 5 days. The expected outcome in dataframe should be something like
Average value
0 avg of first 5days
1 avg of next 5days
2 avg of next 5days
3 avg of next 5days
4 avg of next 5days
5 avg of next 5days
If possible then please help me to get a dataframe something like the below,
Group Days Average value
0 0 avg of first 5days
1 1 avg of next 5days
2 2 avg of next 5days
3 3 avg of next 5days
4 4 avg of next 5days
5 5 avg of next 5days
Please help me with this.
CodePudding user response:
Use DataFrame.resample
with aggregate mean
by 5D
for 5 days:
df = df.resample('5D', on='dates')['Values'].mean().reset_index()
print (df)
dates Values
0 2014-06-17 -0.006692
1 2014-06-22 -0.005764
2 2014-06-27 -0.001277
3 2014-07-02 0.007972
4 2014-07-07 -0.012901
5 2014-07-12 0.007984
6 2014-07-17 0.002757
7 2014-07-22 0.006179
8 2014-07-27 -0.015903
EDIT: If need omit Sundays, Saturdays and holidays use 5B
for bussiness days:
df = df.resample('5B', on='dates')['Values'].mean().reset_index()
print (df)
dates Values
0 2014-06-17 -0.005074
1 2014-06-24 -0.004199
2 2014-07-01 0.002405
3 2014-07-08 -0.012963
4 2014-07-15 0.007427
5 2014-07-22 0.001763
If need Group
column use arange
:
df = df.resample('5D', on='dates')['Values'].mean().reset_index()
df['Group'] = np.arange(len(df))
print (df)
dates Values Group
0 2014-06-17 -0.006692 0
1 2014-06-22 -0.005764 1
2 2014-06-27 -0.001277 2
3 2014-07-02 0.007972 3
4 2014-07-07 -0.012901 4
5 2014-07-12 0.007984 5
6 2014-07-17 0.002757 6
7 2014-07-22 0.006179 7
8 2014-07-27 -0.015903 8