I have some data in a Pandas DataFrame:
Price
1 $10
2 $11
3 $15
4 $18
5 $13
6 $4
7 $25
and I am trying to get the offset for the maximum of the next N
rows. For example, when ****, the output would look like
Price offset
1 $10 2 <- offset is defined as the row offset of the maximum for the next two (N) values ($11 and $15)
2 $11 2 <- similarly, the offset is here is the row offset of the maximum for the next two (N) values ($15 and $18)
3 $15 1
4 $18 1
5 $13 2
6 $4 1
7 $25 0
I can get the value of the maximum over the next N
rows using:
# Here, N=12
df['max_price'] = df['Price'].rolling(12).max().shift(-11)
However, is it possible to get the row offset position for the maximum value of the next N rows using similar logic? Any pointers would be very helpful.
CodePudding user response:
I don't think rolling
has the implementation you're looking for. However, there are other ways to achieve it.
One option is to do separate rolling calculations for each unique "offset" value and use numpy.select
to assign values:
unique_vals = df['offset'].unique()
condlist = [df['offset']==os for os in unique_vals]
choicelist = [df['Price'].rolling(ws).max().shift(-ws 1) for ws in unique_vals]
df['new'] = np.select(condlist, choicelist)
Another option could be to use iterrows
and simply slice df
and find max
:
df['new'] = [df.loc[i:i row['offset']-1, 'Price'].max() for i, row in df.iterrows()]
Output:
Price offset new
1 10 2 11.0
2 11 2 15.0
3 15 1 15.0
4 18 1 18.0
5 13 2 13.0
6 4 1 4.0
7 25 0 NaN
CodePudding user response:
You can use rolling
, but you need to cheat a bit by rolling on the reversed series.
s = df['Price'].str.strip('$').astype(int)
N = 2
df['offset'] = (s.iloc[::-1]
.rolling(N, min_periods=1)
.apply(lambda s: s.idxmax())
.shift()
.iloc[::-1]
-df.index
).fillna(0, downcast='infer')
Output:
Price offset
1 $10 2
2 $11 2
3 $15 1
4 $18 1
5 $13 2
6 $4 1
7 $25 0