I have a dataframe with a timestamp column/index and I am calculating the moving average over the last 5 seconds.
df['Mid-Price'].rolling(window=time_diff, min_periods=1, closed='both').mean()
So far so good.
Now I also need to calculate the moving average for the next 5 seconds. However, my timestamps are not evenly spaced such that I can't just shift the dataframe to recalculate the second average.
The data looks like this:
Timestamp Price Start Stop
0, 2019-01-02 08:30:00, 56.565, 0, 5
1, 2019-01-02 08:30:01, 56.565, 1, 6
2, 2019-01-02 08:30:02, 56.565, 2, 6
3, 2019-01-02 08:30:03, 56.540, 3, 7
4, 2019-01-02 08:30:04, 56.545, 4, 7
5, 2019-01-02 08:30:05, 56.545, 5, 8
6, 2019-01-02 08:30:07, 56.540, 6, 10
7, 2019-01-02 08:30:09, 56.550, 7, 12
8, 2019-01-02 08:30:10, 56.545, 8, 12
9, 2019-01-02 08:30:11, 56.550, 9, 12
10,2019-01-02 08:30:12, 56.570, 10, 13
For example: At index 5 the average over the last 5 seconds would be 56.5541 And I need to compute the average over the next 5 seconds excluding the current time, i.e. index 6,7,8 (56.545).
Using df.index.get_indexer() I am able to extract the index of the last row to be included in the average,
df['stop'] = df.index.get_indexer(df['Date-Time-Exch'] time_diff, method='bfill')
I was hoping that I could somehow use the values in 'start' and 'stop' for slicing with iloc like
df.iloc[df['start']:df['stop'], 1].mean()
but this does not work.
Alternatively, I came up with this:
def get_indexes(time_index, offset):
start, end = df.index.get_indexer([time_index, time_index offset], method='bfill')
avg = df.iloc[start 1:end 1, 1].mean()
return avg
which used with .apply() is sadly far too slow to be useful.
Hopefully you can help me because I have been stuck on this problem for some time now.
CodePudding user response:
You can calculate rolling forward by reverting your dataframe, then calculating rolling average, then reverting again. Also you need to specify closed='left'
(see documentation) when doing this, since you don't want to include current value in your forward average:
rolling = df.Price.rolling(window='5s', closed='both').mean().rename('Mean past')
rolling_forward = df[::-1].Price.rolling(window='5s', closed='left').mean()[::-1].rename('Mean future')
df[['Price']].merge(rolling, on='Timestamp').merge(rolling_forward, on='Timestamp')
Will output
Price Mean past Mean future
Timestamp
2019-01-02 08:30:00 56.565 56.565000 56.552000
2019-01-02 08:30:01 56.565 56.565000 56.548750
2019-01-02 08:30:02 56.565 56.565000 56.542500
2019-01-02 08:30:03 56.540 56.558750 56.543333
2019-01-02 08:30:04 56.545 56.556000 56.545000
2019-01-02 08:30:05 56.545 56.554167 56.545000
2019-01-02 08:30:07 56.540 56.547000 56.553750
2019-01-02 08:30:09 56.550 56.545000 56.555000
2019-01-02 08:30:10 56.545 56.545000 56.560000
2019-01-02 08:30:11 56.550 56.546250 56.570000
2019-01-02 08:30:12 56.570 56.551000 NaN