Home > Software design >  Consecutive rows meeting a condition in pandas
Consecutive rows meeting a condition in pandas

Time:07-22

I have a pandas dataframe like this:

    col_name
0       -1
1       -1
2       -3
3       2
4       1
5       -3
6       -2
7       4
8       3
9       5

that could be created with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'col_name': [-1, -1, -3, 2, 1, -3, -2, 4, 3, 5]
    }
)

I want to find the rows that x rows before them and the row itself have positive values and y rows before those x rows have negative values and also the last row of these y rows which is actually y rows before the current row has the least value compared to k rows before of it.

So, for x=1, y=2 and k=2 the output is:

    col_name
4       1

(Index 8 is not in the output because even though itself and one row before it have positive values, and two rows before them have negative values, but the last row with a negative value which is index 6, doesn't have the least value compared to two rows before itself.)

Also, it's my priority not to use any for-loops for the code.

Have you any idea about this?

CodePudding user response:

Your explanation is not very clear, so I'll put a base solution here and you feel free to modify to your needs. Should not be hard to adjust.

We can achieve that my shifting the series and applying iterative masks.

First, create your shifts:

m = d.assign(**{f'col_name_shift_{i}': d.col_name.shift(i) 
                for i in range(1, x y 1)})

Note that the for loop here is very small (3 iterations only). This gives:

   col_name  col_name_shift_1  col_name_shift_2  col_name_shift_3
0        -1               NaN               NaN               NaN
1        -1              -1.0               NaN               NaN
2        -3              -1.0              -1.0               NaN
3         2              -3.0              -1.0              -1.0
4         1               2.0              -3.0              -1.0
5        -3               1.0               2.0              -3.0
6        -2              -3.0               1.0               2.0
7         4              -2.0              -3.0               1.0
8         3               4.0              -2.0              -3.0
9         5               3.0               4.0              -2.0

Now, it's just a matter of row-wise analyze which rows follow your requirement.

For example,

I want to find the rows that x rows before them and the row itself have positive values

m1 = m.iloc[:, range(x 1)] > 0

and y rows before those x rows have negative values

m2 = m.iloc[:, range(x 1, x y 1)] < 0

and also the last row of these y rows which is actually y rows before the current row has the least value compared to k rows before of it.

m3 = m.iloc[:, range(y 1, y k)].gt(m.iloc[:, y], axis=0)

Then, you concatenate all your boolean series,

mask = pd.concat([m1, m2, m3, axis=1)

and find

df.loc[mask.all(1)]
  • Related