Calculate maximum difference of rolling interval of n columns-CodePudding

I have a dataset

                         Time   Spot Ubalance
0   2017-01-01T00:00:00 01:00  20.96      NaN
1   2017-01-01T01:00:00 01:00  20.90    29.40
2   2017-01-01T02:00:00 01:00  18.13    24.73
3   2017-01-01T03:00:00 01:00  16.03    24.73
4   2017-01-01T04:00:00 01:00  16.43    27.89
5   2017-01-01T05:00:00 01:00  13.75    28.26
6   2017-01-01T06:00:00 01:00  11.10    30.43
7   2017-01-01T07:00:00 01:00  15.47    32.85
8   2017-01-01T08:00:00 01:00  16.88    33.91
9   2017-01-01T09:00:00 01:00  21.81    28.58
10  2017-01-01T10:00:00 01:00  26.24    28.58

I want to generate a series/dataframe in which I calculate the maximum difference between the highest and lowest value of the last n rows within multiple columns, i.e., the maximum difference of these "last" 10 rows would be

33.91 (highest is here in "ubalance") - 11.10 (lowest is in "Spot") = 22.81

I've tried .rolling() but it apparently does not contain a difference attribute.

Expected outcome:

                         Time   Spot Ubalance  Diff
0   2017-01-01T00:00:00 01:00  20.96      NaN  NaN
1   2017-01-01T01:00:00 01:00  20.90    29.40  NaN
2   2017-01-01T02:00:00 01:00  18.13    24.73  NaN
3   2017-01-01T03:00:00 01:00  16.03    24.73  NaN
4   2017-01-01T04:00:00 01:00  16.43    27.89  NaN
5   2017-01-01T05:00:00 01:00  13.75    28.26  NaN
6   2017-01-01T06:00:00 01:00  11.10    30.43  NaN
7   2017-01-01T07:00:00 01:00  15.47    32.85  NaN
8   2017-01-01T08:00:00 01:00  16.88    33.91  NaN
9   2017-01-01T09:00:00 01:00  21.81    28.58  NaN
10  2017-01-01T10:00:00 01:00  26.24    28.58  22.81

CodePudding user response：

Use Rolling.aggregate and then subtract:

df1 = df['Spot'].rolling(10).agg(['min','max'])
print (df1)
     min    max
0    NaN    NaN
1    NaN    NaN
2    NaN    NaN
3    NaN    NaN
4    NaN    NaN
5    NaN    NaN
6    NaN    NaN
7    NaN    NaN
8    NaN    NaN
9   11.1  21.81
10  11.1  26.24

df['dif'] = df1['max'].sub(df1['min'])
print (df)
                         Time   Spot  Ubalance    dif
0   2017-01-01T00:00:00 01:00  20.96       NaN    NaN
1   2017-01-01T01:00:00 01:00  20.90     29.40    NaN
2   2017-01-01T02:00:00 01:00  18.13     24.73    NaN
3   2017-01-01T03:00:00 01:00  16.03     24.73    NaN
4   2017-01-01T04:00:00 01:00  16.43     27.89    NaN
5   2017-01-01T05:00:00 01:00  13.75     28.26    NaN
6   2017-01-01T06:00:00 01:00  11.10     30.43    NaN
7   2017-01-01T07:00:00 01:00  15.47     32.85    NaN
8   2017-01-01T08:00:00 01:00  16.88     33.91    NaN
9   2017-01-01T09:00:00 01:00  21.81     28.58  10.71
10  2017-01-01T10:00:00 01:00  26.24     28.58  15.14

Or custom function with lambda:

df['diff'] = df['Spot'].rolling(10).agg(lambda x: x.max() - x.min())

EDIT:

For processing all columns from list use:

cols = ['Spot','Ubalance']
N = 10

df['dif'] = (df[cols].stack(dropna=False)
                     .rolling(len(cols) * N)
                     .agg(lambda x: x.max() - x.min())
                     .groupby(level=0)
                     .max())

print (df)
                         Time   Spot  Ubalance    dif
0   2017-01-01T00:00:00 01:00  20.96       NaN    NaN
1   2017-01-01T01:00:00 01:00  20.90     29.40    NaN
2   2017-01-01T02:00:00 01:00  18.13     24.73    NaN
3   2017-01-01T03:00:00 01:00  16.03     24.73    NaN
4   2017-01-01T04:00:00 01:00  16.43     27.89    NaN
5   2017-01-01T05:00:00 01:00  13.75     28.26    NaN
6   2017-01-01T06:00:00 01:00  11.10     30.43    NaN
7   2017-01-01T07:00:00 01:00  15.47     32.85    NaN
8   2017-01-01T08:00:00 01:00  16.88     33.91    NaN
9   2017-01-01T09:00:00 01:00  21.81     28.58    NaN
10  2017-01-01T10:00:00 01:00  26.24     28.58  22.81

CodePudding user response：

you could a rolling window like this:

n = 10
df.rolling(3).apply(func=lambda x: x.max() - x.min())

you can specify in the lambda function the column you want to do the rolling window