Home > database >  Find a row meeting conditions at most in n next rows in pandas
Find a row meeting conditions at most in n next rows in pandas

Time:07-21

I have a pandas dataframe like this:

    close   low     max_close   higher_low
0     2      1        True        False
1     3      4        False       True
2     1      2        True        False
3     0      3        False       False
4     5      2        False       True
5     4      5        False       True
6     3      3        True        False
7     6      7        False       True

and could be created with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'close': [2, 3, 1, 0, 5, 4, 3, 6], 
        'low': [1, 4, 2, 3, 2, 5, 3, 7], 
        'max_close': [True, False, True, False, False, False, True, False], 
        'higher_low': [False, True, False, False, True, True, False, True]
    }
)

For any row with a True value in the max_close column, I want to find the first row in the next rows where the value in the higher_low column is True and the value in the low column is greater than the value in the close column and also this row must be at most in the next 2 rows after the row where the value in max_close column was True.

So the output should be :

    close   low     max_close   higher_low
1     3      4        False        True
7     6      7        False        True

(Index 4 is not in the output because in this row: low < close. Also, index 5 is not in the output because it's three rows after index 2, while we have a condition that it should be at most in the next 2 rows.)

Also, it's my priority not to use any for-loops in the code.

Have you any idea about this?

CodePudding user response:

Use -

lookup = 2

indices = []
for i in range(1, lookup 1):
    if i > 1:
        tmp = df.loc[(df[df['max_close']].loc[:-(i-1)].index i)]
    else:
        tmp = df.loc[(df[df['max_close']].index i)]
    tmp_ind = tmp[(tmp['higher_low']) & (tmp['low']>tmp['close'])].index
    indices  = tmp_ind.tolist()
df.loc[set(indices)]

Output

   close  low  max_close  higher_low
1      3    4      False        True
7      6    7      False        True

CodePudding user response:

Create virtual groups from max_close column then keep the 3 first rows (1 row for max_close and the 2 following). Finally, filter out on your 2 conditions:

out = (df.groupby(df['max_close'].cumsum()).head(3)
         .query("higher_low & (close < low)"))
print(out)

# Output
   close  low  max_close  higher_low
1      3    4      False        True
7      6    7      False        True
  • Related