I have a dataset
df
Time Spot Ubalance
0 2017-01-01T00:00:00 01:00 20.96 NaN
1 2017-01-01T01:00:00 01:00 20.90 29.40
2 2017-01-01T02:00:00 01:00 18.13 24.73
3 2017-01-01T03:00:00 01:00 16.03 24.73
4 2017-01-01T04:00:00 01:00 16.43 27.89
5 2017-01-01T05:00:00 01:00 13.75 28.26
6 2017-01-01T06:00:00 01:00 11.10 30.43
7 2017-01-01T07:00:00 01:00 15.47 32.85
8 2017-01-01T08:00:00 01:00 16.88 33.91
9 2017-01-01T09:00:00 01:00 21.81 28.58
10 2017-01-01T10:00:00 01:00 26.24 28.58
I want to generate a series/dataframe in which I calculate the maximum difference between the highest and lowest value of the last n rows within multiple columns, i.e., the maximum difference of these "last" 10 rows would be
33.91 (highest is here in "ubalance") - 11.10 (lowest is in "Spot") = 22.81
I've tried .rolling() but it apparently does not contain a difference attribute.
Expected outcome:
Time Spot Ubalance Diff
0 2017-01-01T00:00:00 01:00 20.96 NaN NaN
1 2017-01-01T01:00:00 01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00 01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00 01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00 01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00 01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00 01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00 01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00 01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00 01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00 01:00 26.24 28.58 22.81
CodePudding user response:
Use Rolling.aggregate
and then subtract:
df1 = df['Spot'].rolling(10).agg(['min','max'])
print (df1)
min max
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 11.1 21.81
10 11.1 26.24
df['dif'] = df1['max'].sub(df1['min'])
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00 01:00 20.96 NaN NaN
1 2017-01-01T01:00:00 01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00 01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00 01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00 01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00 01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00 01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00 01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00 01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00 01:00 21.81 28.58 10.71
10 2017-01-01T10:00:00 01:00 26.24 28.58 15.14
Or custom function with lambda
:
df['diff'] = df['Spot'].rolling(10).agg(lambda x: x.max() - x.min())
EDIT:
For processing all columns from list use:
cols = ['Spot','Ubalance']
N = 10
df['dif'] = (df[cols].stack(dropna=False)
.rolling(len(cols) * N)
.agg(lambda x: x.max() - x.min())
.groupby(level=0)
.max())
print (df)
Time Spot Ubalance dif
0 2017-01-01T00:00:00 01:00 20.96 NaN NaN
1 2017-01-01T01:00:00 01:00 20.90 29.40 NaN
2 2017-01-01T02:00:00 01:00 18.13 24.73 NaN
3 2017-01-01T03:00:00 01:00 16.03 24.73 NaN
4 2017-01-01T04:00:00 01:00 16.43 27.89 NaN
5 2017-01-01T05:00:00 01:00 13.75 28.26 NaN
6 2017-01-01T06:00:00 01:00 11.10 30.43 NaN
7 2017-01-01T07:00:00 01:00 15.47 32.85 NaN
8 2017-01-01T08:00:00 01:00 16.88 33.91 NaN
9 2017-01-01T09:00:00 01:00 21.81 28.58 NaN
10 2017-01-01T10:00:00 01:00 26.24 28.58 22.81
CodePudding user response:
you could a rolling window like this:
n = 10
df.rolling(3).apply(func=lambda x: x.max() - x.min())
you can specify in the lambda function the column you want to do the rolling window