Home > Software design >  In dataframe, how to recognize rows with more than 3 consecutive zeros?
In dataframe, how to recognize rows with more than 3 consecutive zeros?

Time:11-15

I have a dataframe like this.

A dataframe with consecutive zeros in column 'volume'.

And I want to recognize the some rows, so that any recognized row is in a block with more than 3 (>=4) consecutive zeros. (marked as cannot_trade)

What I've tried is:

df['cannot_trade'] = (
(df['volume'] == 0) & (df['volume'].shift(1) == 0) & (df['volume'].shift(2) == 0) & (df['volume'].shift(3) == 0) ) | (
(df['volume'] == 0) & (df['volume'].shift(-1) == 0) & (df['volume'].shift(-2) == 0) & (df['volume'].shift(-3) == 0)) 

It does recognize long consecutive zeros but we can see in the dataframe that at 'closetime'=='2022-10-17 23:13:00' and 'closetime'=='2022-10-17 23:14:00', the recognition fail, where 'cannot_trade' should have been set 'True'.

Do you have any idea how to fix this?

Here's the text format of the dataframe:

 close  volume         symbol           closetime  cannot_trade
63292514  0.2544     910  ZRX-USDT-SWAP 2022-10-17 23:01:00         False
63292515  0.2544       0  ZRX-USDT-SWAP 2022-10-17 23:02:00         False
63292516  0.2544       0  ZRX-USDT-SWAP 2022-10-17 23:03:00         False
63292517  0.2544       0  ZRX-USDT-SWAP 2022-10-17 23:04:00         False
63292518  0.2543     222  ZRX-USDT-SWAP 2022-10-17 23:05:00         False
63292519  0.2543       0  ZRX-USDT-SWAP 2022-10-17 23:06:00         False
63292520  0.2546    1119  ZRX-USDT-SWAP 2022-10-17 23:07:00         False
63292521  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:08:00         False
63292522  0.2549      84  ZRX-USDT-SWAP 2022-10-17 23:09:00         False
63292523  0.2549       0  ZRX-USDT-SWAP 2022-10-17 23:10:00         False
63292524  0.2548     181  ZRX-USDT-SWAP 2022-10-17 23:11:00         False
63292525  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:12:00          True
63292526  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:13:00         False
63292527  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:14:00         False
63292528  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:15:00          True
63292529  0.2546     222  ZRX-USDT-SWAP 2022-10-17 23:16:00         False
63292530  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:17:00          True
63292531  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:18:00          True
63292532  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:19:00          True
63292533  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:20:00          True
63292534  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:21:00          True
63292535  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:22:00          True
63292536  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:23:00          True
63292537  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:24:00          True
63292538  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:25:00          True
63292539  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:26:00          True
63292540  0.2546       0  ZRX-USDT-SWAP 2022-10-17 23:27:00          True
63292541  0.2545     100  ZRX-USDT-SWAP 2022-10-17 23:28:00         False
63292542  0.2545       0  ZRX-USDT-SWAP 2022-10-17 23:29:00         False
63292543  0.2545       0  ZRX-USDT-SWAP 2022-10-17 23:30:00         False
63292544  0.2547     141  ZRX-USDT-SWAP 2022-10-17 23:31:00         False
63292545  0.2548       3  ZRX-USDT-SWAP 2022-10-17 23:32:00         False
63292546  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:33:00         False
63292547  0.2548       0  ZRX-USDT-SWAP 2022-10-17 23:34:00         False
63292548  0.2552     130  ZRX-USDT-SWAP 2022-10-17 23:35:00         False
63292549  0.2555     214  ZRX-USDT-SWAP 2022-10-17 23:36:00         False
63292550  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:37:00          True
63292551  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:38:00          True
63292552  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:39:00         False
63292553  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:40:00          True
63292554  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:41:00          True
63292555  0.2557     103  ZRX-USDT-SWAP 2022-10-17 23:42:00         False
63292556  0.2557       0  ZRX-USDT-SWAP 2022-10-17 23:43:00         False
63292557  0.2557     127  ZRX-USDT-SWAP 2022-10-17 23:44:00         False
63292558  0.2557       0  ZRX-USDT-SWAP 2022-10-17 23:45:00         False
63292559  0.2559      91  ZRX-USDT-SWAP 2022-10-17 23:46:00         False
63292560  0.2558     100  ZRX-USDT-SWAP 2022-10-17 23:47:00         False
63292561  0.2558       0  ZRX-USDT-SWAP 2022-10-17 23:48:00         False
63292562  0.2558      39  ZRX-USDT-SWAP 2022-10-17 23:49:00         False
63292563  0.2558       0  ZRX-USDT-SWAP 2022-10-17 23:50:00         False
63292564  0.2558       0  ZRX-USDT-SWAP 2022-10-17 23:51:00         False
63292565  0.2558       0  ZRX-USDT-SWAP 2022-10-17 23:52:00         False
63292566  0.2555     260  ZRX-USDT-SWAP 2022-10-17 23:53:00         False
63292567  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:54:00         False
63292568  0.2555       0  ZRX-USDT-SWAP 2022-10-17 23:55:00         False
63292569  0.2550     375  ZRX-USDT-SWAP 2022-10-17 23:56:00         False
63292570  0.2549      43  ZRX-USDT-SWAP 2022-10-17 23:57:00         False
63292571  0.2549     195  ZRX-USDT-SWAP 2022-10-17 23:58:00         False
63292572  0.2550     141  ZRX-USDT-SWAP 2022-10-17 23:59:00         False
63292573  0.2550     209  ZRX-USDT-SWAP 2022-10-18 00:00:00         False

CodePudding user response:

You can use a groupby.transform('count'):

N = 4

m = df['volume'].ne(0)
               
df['cannot_trade'] = (df.groupby(m.cumsum())['volume']
                        .transform('count').gt(N)
                      & (~m)
                     )

Example:

    volume  cannot_trade
0        0         False
1        0         False
2        0         False
3        1         False
4        0          True
5        0          True
6        0          True
7        0          True
8        2         False
9        1         False
10       0         False
11       1         False

CodePudding user response:

Create groups by compare for not equal 0 with cumualtive sum ad replace NaN for m values, last count values by Series.value_counts with Series.map for new column, last compare for greater or equal by N:

df = pd.DataFrame({'volume':[0,0,1,1,0,0,0,0]})

N = 4
m = df['volume'].ne(0)

s = m.cumsum().mask(m)
df['cannot_trade'] = s.map(s.value_counts()).ge(N)
print (df)
   volume  cannot_trade
0       0         False
1       0         False
2       1         False
3       1         False
4       0          True
5       0          True
6       0          True
7       0          True

Details of solution:

N = 4
m = df['volume'].ne(0)
s = m.cumsum().mask(m)

print (df.assign(m = df['volume'].ne(0),
                 s = m.cumsum().mask(m),
                 count = s.map(s.value_counts()),
                 cannot_trade = s.map(s.value_counts()).ge(N)))
   volume      m    s  count  cannot_trade
0       0  False  0.0    2.0         False
1       0  False  0.0    2.0         False
2       1   True  NaN    NaN         False
3       1   True  NaN    NaN         False
4       0  False  2.0    4.0          True
5       0  False  2.0    4.0          True
6       0  False  2.0    4.0          True
7       0  False  2.0    4.0          True

CodePudding user response:

If i have understood correctlly you could apply a rolling window-sum approach to check if 3 continuos events are equal to 0 and after that apply vanilla python filters. Assuming so is the name of your df and "a" the column you want to check and "b" the boolean column:

so["sum"] = so["a"].rolling(4).sum()

And now the filtering:

so.loc[(so["sum"] == 0) & (so["b"] == True), :]
  • Related