Home > OS >  Find first row after a specific row with higher value in a column in pandas
Find first row after a specific row with higher value in a column in pandas

Time:07-20

I have a pandas dataframe like this:

    first   second   third
0     2       2      False
1     3       1      True
2     1       4      False
3     0       6      False
4     5       7      True
5     4       2      False
6     3       4      False
7     3       6      True

and it could be created with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'first': [2, 3, 1, 0, 5, 4, 3, 3], 
        'second': [2, 1, 4, 6, 7, 2, 4, 6], 
        'third': [False, True, False, False, True, False, False, True]
    }
)

For any row with a True value in the third column, I want to find the first row in the next rows which has a value in the second column greater than the value in the first column.

So the output should be:

    first   second   third
2     1       4      False
6     3       4      False

And also it is my priority not to use any for-loop.

Have you any idea about this?

CodePudding user response:

You can try

m = df['third'].cumsum()

out = (df[m.gt(0) & (~df['third'])] # filter out heading False row and the middle True row
       .groupby(m, as_index=False)
       # select the first row that value in the second column greater than in the first column
       .apply(lambda g: g[g['second'].gt(g['first'])].iloc[:1]))
print(out)

   first  second  third
0      1       4  False
1      3       4  False

CodePudding user response:

One approach, using numpy.searchsorted (for performance):

import numpy as np

# find all indices where first < second
m = df.index[df['first'] < df['second']]

# find all indices, but the last, where third is True
r = df.index[:-1][df.iloc[:-1]['third']]

# use searchsorted to find in O(logn) the next row where first < second
res = df.iloc[m[np.searchsorted(m, r, side="right")]]
print(res)

Output

   first  second  third
2      1       4  False
6      3       4  False
  • Related