I am trying to figure out how I can mark the rows where the price are part of 4 increase prices . the "is_consecutive" is actually the mark .
I managed to do the diff between the rows :
df['diff1'] = df['Close'].diff()
But I didn't managed to find out which row is a part of 4 increase prices .
I had a thought to use df.rolling() .
The exmple df,
On rows 0-3 , we need to get an output of 'True' on the ["is_consecutive"] column , because the ['diff1'] on this consecutive rows is increase for 4 rows .
On rows 8-11 , we need to get an output of 'False' on the ["is_consecutive"] column , because the ['diff1'] on this consecutive rows is zero .
Date Price diff1 is_consecutive
0 1/22/20 0 0 True
1 1/23/20 130 130 True
2 1/24/20 144 14 True
3 1/25/20 150 6 True
4 1/27/20 60 -90 False
5 1/28/20 95 35 False
6 1/29/20 100 5 False
7 1/30/20 50 -50 False
8 2/01/20 100 0 False
9 1/02/20 100 0 False
10 1/03/20 100 0 False
11 1/04/20 100 0 False
12 1/05/20 50 -50 False
general example :
if price = [30,55,60,65,25]
the different form the consecutive number on the list will be :
diff1 = [0,25,5,5,-40]
So when the diff1 is plus its actually means the consecutive prices are increase .
I need to mark(in the df) the rows that have 4 consecutive that go up.
Thank You for help (-:
CodePudding user response:
Try: .rolling
with window of size 4
and min periods 1
:
df["is_consecutive"] = (
df["Price"]
.rolling(4, min_periods=1)
.apply(lambda x: (x.diff().fillna(0) >= 0).all())
.astype(bool)
)
print(df)
Prints:
Date Price is_consecutive
0 1/22/20 0 True
1 1/23/20 130 True
2 1/24/20 144 True
3 1/25/20 150 True
4 1/26/20 60 False
5 1/26/20 95 False
6 1/26/20 100 False
7 1/26/20 50 False
CodePudding user response:
Assuming the dataframe is sorted. One way is based on the cumsum of the differences to identify the first time an upward Price move succeeding a 3 days upwards trend (i.e. 4 days of upward trend).
quant1 = (df['Price'].diff().apply(np.sign) == 1).cumsum()
quant2 = (df['Price'].diff().apply(np.sign) == 1).cumsum().where(~(df['Price'].diff().apply(np.sign) == 1)).ffill().fillna(0).astype(int)
df['is_consecutive'] = (quant1-quant2) >= 3
note that the above takes into account only strictly increasing Prices (not equal).
Then we override also the is_consecutive tag for the previous 3 Prices to be also TRUE using the win_view self defined function:
def win_view(x, size):
if isinstance(x, list):
x = np.array(x)
if isinstance(x, pd.core.series.Series):
x = x.values
if isinstance(x, np.ndarray):
pass
else:
raise Exception('wrong type')
return np.lib.stride_tricks.as_strided(
x,
shape=(x.size - size 1, size),
strides=(x.strides[0], x.strides[0])
)
arr = win_view(df['is_consecutive'], 4)
arr[arr[:,3]] = True
Note that we inplace replace the values to be True.
EDIT 1 Inspired by the self defined win_view function, I realized that the solution it can be obtained simply by win_view (without the need of using cumsums) as below:
df['is_consecutive'] = False
arr = win_view(df['Price'].diff(), 4)
arr_ind = win_view(list(df['Price'].index), 4)
mask = arr_ind[np.all(arr[:, 1:] > 0, axis=1)].flatten()
df.loc[mask, 'is_consecutive'] = True
We maintain 2 arrays, 1 for the returns and 1 for the indices. We collect the indices where we have 3 consecutive positive return np.all(arr[:, 1:] > 0, axis=1
(i.e. 4 upmoving prices) and we replace those in our original df.