Home > OS >  Find the row offset for the maximum value over the next N rows in Pandas?
Find the row offset for the maximum value over the next N rows in Pandas?

Time:03-19

I have some data in a Pandas DataFrame:

  Price
1 $10
2 $11
3 $15
4 $18
5 $13
6 $4
7 $25

and I am trying to get the offset for the maximum of the next N rows. For example, when ****, the output would look like

  Price  offset
1 $10    2   <- offset is defined as the row offset of the maximum for the next two (N) values ($11 and $15)
2 $11    2   <- similarly, the offset is here is the row offset of the maximum for the next two (N) values ($15 and $18)
3 $15    1
4 $18    1
5 $13    2
6 $4     1    
7 $25    0

I can get the value of the maximum over the next N rows using:

# Here, N=12
df['max_price'] = df['Price'].rolling(12).max().shift(-11)

However, is it possible to get the row offset position for the maximum value of the next N rows using similar logic? Any pointers would be very helpful.

CodePudding user response:

I don't think rolling has the implementation you're looking for. However, there are other ways to achieve it.

One option is to do separate rolling calculations for each unique "offset" value and use numpy.select to assign values:

unique_vals = df['offset'].unique()
condlist = [df['offset']==os for os in unique_vals]
choicelist = [df['Price'].rolling(ws).max().shift(-ws 1) for ws in unique_vals]
df['new'] = np.select(condlist, choicelist)    

Another option could be to use iterrows and simply slice df and find max:

df['new'] = [df.loc[i:i row['offset']-1, 'Price'].max() for i, row in df.iterrows()]

Output:

   Price  offset   new
1     10       2  11.0
2     11       2  15.0
3     15       1  15.0
4     18       1  18.0
5     13       2  13.0
6      4       1   4.0
7     25       0   NaN

CodePudding user response:

You can use rolling, but you need to cheat a bit by rolling on the reversed series.

s = df['Price'].str.strip('$').astype(int)

N = 2
df['offset'] = (s.iloc[::-1]
                .rolling(N, min_periods=1)
                .apply(lambda s: s.idxmax())
                .shift()
                .iloc[::-1]
                -df.index
                ).fillna(0, downcast='infer')

Output:

  Price  offset
1   $10       2
2   $11       2
3   $15       1
4   $18       1
5   $13       2
6    $4       1
7   $25       0
  • Related