Home > database >  Efficient and elegant way to get sub-range maximum values of a Pandas Series
Efficient and elegant way to get sub-range maximum values of a Pandas Series

Time:09-08

Say I have a pandas Series:

index | value
-------------
0     | 2
1     | 0
2     | 8
3     | 0
4     | 1
5     | 2
6     | 7
7     | 4
8     | 2
9     | 9
10    | 0
11    | 0

I have to get a series (or array) of subrange maximum values. For example, a subrange of 5. For the first element, the value should be max{2, 0, 8, 0, 1} = 8. The second value should be max{0, 8, 0, 1, 2} = 8.

Starting from the 8th element, there are less than 5 elements in the subrange. The value should just be the maximum of the remaining elements.

It should be like:

index | value
-------------
0     | 8
1     | 8
2     | 8
3     | 7
4     | 7
5     | 9
6     | 9
7     | 9
8     | 9
9     | 9
10    | 0
11    | 0

I know we can simply do this by iterating the Series. But as I know, that's not quite efficient if we use iloc or iterate by using iterrows(). Is there any more efficient and elegant way to do this? I heard that vector operation should be very quick. But I haven't found out how to use that.

CodePudding user response:

You can check rolling

df['value'] = df['value'].iloc[::-1].rolling(5,min_periods=1).max()
Out[158]: 
0     8.0
1     8.0
2     8.0
3     7.0
4     7.0
5     9.0
6     9.0
7     9.0
8     9.0
9     9.0
10    0.0
11    0.0
Name: value, dtype: float64
  • Related