Pandas: Get average of a dynamic number of rows-CodePudding

I have a dataframe with a timestamp column/index and I am calculating the moving average over the last 5 seconds. df['Mid-Price'].rolling(window=time_diff, min_periods=1, closed='both').mean() So far so good. Now I also need to calculate the moving average for the next 5 seconds. However, my timestamps are not evenly spaced such that I can't just shift the dataframe to recalculate the second average.

The data looks like this:

   Timestamp             Price    Start Stop 
0, 2019-01-02 08:30:00,  56.565,  0,    5
1, 2019-01-02 08:30:01,  56.565,  1,    6
2, 2019-01-02 08:30:02,  56.565,  2,    6
3, 2019-01-02 08:30:03,  56.540,  3,    7
4, 2019-01-02 08:30:04,  56.545,  4,    7
5, 2019-01-02 08:30:05,  56.545,  5,    8
6, 2019-01-02 08:30:07,  56.540,  6,    10
7, 2019-01-02 08:30:09,  56.550,  7,    12
8, 2019-01-02 08:30:10,  56.545,  8,    12
9, 2019-01-02 08:30:11,  56.550,  9,    12
10,2019-01-02 08:30:12,  56.570,  10,   13

For example: At index 5 the average over the last 5 seconds would be 56.5541 And I need to compute the average over the next 5 seconds excluding the current time, i.e. index 6,7,8 (56.545).

Using df.index.get_indexer() I am able to extract the index of the last row to be included in the average,

df['stop'] = df.index.get_indexer(df['Date-Time-Exch']   time_diff, method='bfill')

I was hoping that I could somehow use the values in 'start' and 'stop' for slicing with iloc like

df.iloc[df['start']:df['stop'], 1].mean()

but this does not work.

Alternatively, I came up with this:

def get_indexes(time_index, offset):
    start, end = df.index.get_indexer([time_index, time_index   offset], method='bfill')
    avg = df.iloc[start   1:end   1, 1].mean()
    return avg

which used with .apply() is sadly far too slow to be useful.

Hopefully you can help me because I have been stuck on this problem for some time now.

CodePudding user response：

You can calculate rolling forward by reverting your dataframe, then calculating rolling average, then reverting again. Also you need to specify closed='left' (see documentation) when doing this, since you don't want to include current value in your forward average:

rolling = df.Price.rolling(window='5s', closed='both').mean().rename('Mean past')
rolling_forward = df[::-1].Price.rolling(window='5s', closed='left').mean()[::-1].rename('Mean future')
df[['Price']].merge(rolling, on='Timestamp').merge(rolling_forward, on='Timestamp')

Will output

                    Price   Mean past   Mean future
Timestamp           
2019-01-02 08:30:00 56.565  56.565000   56.552000
2019-01-02 08:30:01 56.565  56.565000   56.548750
2019-01-02 08:30:02 56.565  56.565000   56.542500
2019-01-02 08:30:03 56.540  56.558750   56.543333
2019-01-02 08:30:04 56.545  56.556000   56.545000
2019-01-02 08:30:05 56.545  56.554167   56.545000
2019-01-02 08:30:07 56.540  56.547000   56.553750
2019-01-02 08:30:09 56.550  56.545000   56.555000
2019-01-02 08:30:10 56.545  56.545000   56.560000
2019-01-02 08:30:11 56.550  56.546250   56.570000
2019-01-02 08:30:12 56.570  56.551000   NaN