I have a dataframe like this.
And I want to recognize the some rows, so that any recognized row is in a block with more than 3 (>=4) consecutive zeros. (marked as cannot_trade)
What I've tried is:
df['cannot_trade'] = (
(df['volume'] == 0) & (df['volume'].shift(1) == 0) & (df['volume'].shift(2) == 0) & (df['volume'].shift(3) == 0) ) | (
(df['volume'] == 0) & (df['volume'].shift(-1) == 0) & (df['volume'].shift(-2) == 0) & (df['volume'].shift(-3) == 0))
It does recognize long consecutive zeros but we can see in the dataframe that at 'closetime'=='2022-10-17 23:13:00' and 'closetime'=='2022-10-17 23:14:00', the recognition fail, where 'cannot_trade' should have been set 'True'.
Do you have any idea how to fix this?
Here's the text format of the dataframe:
close volume symbol closetime cannot_trade
63292514 0.2544 910 ZRX-USDT-SWAP 2022-10-17 23:01:00 False
63292515 0.2544 0 ZRX-USDT-SWAP 2022-10-17 23:02:00 False
63292516 0.2544 0 ZRX-USDT-SWAP 2022-10-17 23:03:00 False
63292517 0.2544 0 ZRX-USDT-SWAP 2022-10-17 23:04:00 False
63292518 0.2543 222 ZRX-USDT-SWAP 2022-10-17 23:05:00 False
63292519 0.2543 0 ZRX-USDT-SWAP 2022-10-17 23:06:00 False
63292520 0.2546 1119 ZRX-USDT-SWAP 2022-10-17 23:07:00 False
63292521 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:08:00 False
63292522 0.2549 84 ZRX-USDT-SWAP 2022-10-17 23:09:00 False
63292523 0.2549 0 ZRX-USDT-SWAP 2022-10-17 23:10:00 False
63292524 0.2548 181 ZRX-USDT-SWAP 2022-10-17 23:11:00 False
63292525 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:12:00 True
63292526 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:13:00 False
63292527 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:14:00 False
63292528 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:15:00 True
63292529 0.2546 222 ZRX-USDT-SWAP 2022-10-17 23:16:00 False
63292530 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:17:00 True
63292531 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:18:00 True
63292532 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:19:00 True
63292533 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:20:00 True
63292534 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:21:00 True
63292535 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:22:00 True
63292536 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:23:00 True
63292537 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:24:00 True
63292538 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:25:00 True
63292539 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:26:00 True
63292540 0.2546 0 ZRX-USDT-SWAP 2022-10-17 23:27:00 True
63292541 0.2545 100 ZRX-USDT-SWAP 2022-10-17 23:28:00 False
63292542 0.2545 0 ZRX-USDT-SWAP 2022-10-17 23:29:00 False
63292543 0.2545 0 ZRX-USDT-SWAP 2022-10-17 23:30:00 False
63292544 0.2547 141 ZRX-USDT-SWAP 2022-10-17 23:31:00 False
63292545 0.2548 3 ZRX-USDT-SWAP 2022-10-17 23:32:00 False
63292546 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:33:00 False
63292547 0.2548 0 ZRX-USDT-SWAP 2022-10-17 23:34:00 False
63292548 0.2552 130 ZRX-USDT-SWAP 2022-10-17 23:35:00 False
63292549 0.2555 214 ZRX-USDT-SWAP 2022-10-17 23:36:00 False
63292550 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:37:00 True
63292551 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:38:00 True
63292552 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:39:00 False
63292553 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:40:00 True
63292554 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:41:00 True
63292555 0.2557 103 ZRX-USDT-SWAP 2022-10-17 23:42:00 False
63292556 0.2557 0 ZRX-USDT-SWAP 2022-10-17 23:43:00 False
63292557 0.2557 127 ZRX-USDT-SWAP 2022-10-17 23:44:00 False
63292558 0.2557 0 ZRX-USDT-SWAP 2022-10-17 23:45:00 False
63292559 0.2559 91 ZRX-USDT-SWAP 2022-10-17 23:46:00 False
63292560 0.2558 100 ZRX-USDT-SWAP 2022-10-17 23:47:00 False
63292561 0.2558 0 ZRX-USDT-SWAP 2022-10-17 23:48:00 False
63292562 0.2558 39 ZRX-USDT-SWAP 2022-10-17 23:49:00 False
63292563 0.2558 0 ZRX-USDT-SWAP 2022-10-17 23:50:00 False
63292564 0.2558 0 ZRX-USDT-SWAP 2022-10-17 23:51:00 False
63292565 0.2558 0 ZRX-USDT-SWAP 2022-10-17 23:52:00 False
63292566 0.2555 260 ZRX-USDT-SWAP 2022-10-17 23:53:00 False
63292567 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:54:00 False
63292568 0.2555 0 ZRX-USDT-SWAP 2022-10-17 23:55:00 False
63292569 0.2550 375 ZRX-USDT-SWAP 2022-10-17 23:56:00 False
63292570 0.2549 43 ZRX-USDT-SWAP 2022-10-17 23:57:00 False
63292571 0.2549 195 ZRX-USDT-SWAP 2022-10-17 23:58:00 False
63292572 0.2550 141 ZRX-USDT-SWAP 2022-10-17 23:59:00 False
63292573 0.2550 209 ZRX-USDT-SWAP 2022-10-18 00:00:00 False
CodePudding user response:
You can use a groupby.transform('count')
:
N = 4
m = df['volume'].ne(0)
df['cannot_trade'] = (df.groupby(m.cumsum())['volume']
.transform('count').gt(N)
& (~m)
)
Example:
volume cannot_trade
0 0 False
1 0 False
2 0 False
3 1 False
4 0 True
5 0 True
6 0 True
7 0 True
8 2 False
9 1 False
10 0 False
11 1 False
CodePudding user response:
Create groups by compare for not equal 0
with cumualtive sum ad replace NaN
for m
values, last count values by Series.value_counts
with Series.map
for new column, last compare for greater or equal by N
:
df = pd.DataFrame({'volume':[0,0,1,1,0,0,0,0]})
N = 4
m = df['volume'].ne(0)
s = m.cumsum().mask(m)
df['cannot_trade'] = s.map(s.value_counts()).ge(N)
print (df)
volume cannot_trade
0 0 False
1 0 False
2 1 False
3 1 False
4 0 True
5 0 True
6 0 True
7 0 True
Details of solution:
N = 4
m = df['volume'].ne(0)
s = m.cumsum().mask(m)
print (df.assign(m = df['volume'].ne(0),
s = m.cumsum().mask(m),
count = s.map(s.value_counts()),
cannot_trade = s.map(s.value_counts()).ge(N)))
volume m s count cannot_trade
0 0 False 0.0 2.0 False
1 0 False 0.0 2.0 False
2 1 True NaN NaN False
3 1 True NaN NaN False
4 0 False 2.0 4.0 True
5 0 False 2.0 4.0 True
6 0 False 2.0 4.0 True
7 0 False 2.0 4.0 True
CodePudding user response:
If i have understood correctlly you could apply a rolling window-sum approach to check if 3 continuos events are equal to 0 and after that apply vanilla python filters. Assuming so is the name of your df and "a" the column you want to check and "b" the boolean column:
so["sum"] = so["a"].rolling(4).sum()
And now the filtering:
so.loc[(so["sum"] == 0) & (so["b"] == True), :]