I am trying to identify a minimum that occurs before a maximum that is found within a rolling window that starts on row after (yes that is convoluted but I don’t have the English to express it otherwise!)
By way of an example:
First I want to return the maximum value within a window of n length starting the from row after/below, ie for this toy data, and using window=3:
data = pd.Series([6,5,3,4,7,2,1])
The required output in this stage would be 5, 7, 7, 7. ie the 0th output is 5 because is the highest of 5,3,4 (the 0th return is looking at the 1th,2th & 3th values), 7 is the highest of the 2th,3th,4th values etc etc
This I can calculate using numpy stride_tricks:
np.max(np.lib.stride_tricks.sliding_window_view(data.values,3)[1:], axis=1)
which give me array([5, 7, 7, 7]) which is what I want.
I can also find the ‘forward’ index of the max using:
np.max(np.lib.stride_tricks.sliding_window_view(data.values,3)[1:], axis=1)
which gives me the number of rows the max is after the first row to be observed.
What I am struggling with is then I also need to return the low within the window but only BEFORE the high in that window (if there is no low before the high then I want to return the high)
Ie going back to my toy data:
data = pd.Series([6,5,3,4,7,2,1])
I require the output:
5,3,4,7 because
0: 5 is the highest out of 5,3,4 – there is no low before it in the window 1: 3 because 7 is the highest out of 3,4,7 and 3 is the lowest number in the window before 7 2: 4 because 7 is the highest out of 4,7, 2 and 4 is the lowest number in the window before 7 3: 7 is the highest out of 7,2,1 – there is no low before it in the window
Thanks!
CodePudding user response:
The stride tricks approach is not fast, but it lets you treat each window as a separate row. You can use the fact that np.minimum
is a ufunc, and has an accumulate
method to find the smallest number seen so far along the axis. You can then use np.argmax
instead of np.max
to extract that value from each row (window):
windows = np.lib.stride_tricks.sliding_window_view(data, 3)
prior_minima = np.minimum.accumulate(windows, axis=1)
max_idx = np.argmax(windows, axis=1)
maxima = windows[np.arange(len(windows)), max_idx]
minima = prior_minima[np.arange(len(windows)), max_idx]
For your example:
>>> maxima
array([6, 5, 7, 7, 7])
>>> minima
array([6, 5, 3, 4, 7])
CodePudding user response:
A pandas
solution that is more memory efficient and faster for large series with large windows because it doesn't need to create an intermediate array of shape (rows, window) to accumulate the minima. About 3x faster in a benchmark with 500_000 rows and a window size of 1000.
import pandas as pd
def min_b4_max(x):
return x[x[:x.argmax() 1].argmin()]
data = pd.Series([6,5,3,4,7,2,1])
data[1:].rolling(3).apply(min_b4_max, raw=True, engine='numba')
Output
1 NaN
2 NaN
3 5.0
4 3.0
5 4.0
6 7.0
dtype: float64