Home > Net >  Find next first row meeting a condition after a specific row in pandas
Find next first row meeting a condition after a specific row in pandas

Time:07-21

I have a pandas dataframe like this:

    first   second
0   True    False
1   False   True
2   True    True
3   False   False
4   False   True
5   False   True
6   True    False
7   False   False

that could be created with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'first': [True, False, True, False, False, False, True, False], 
        'second': [False, True, True, False, True, True, False, False]
    }
)

For any row having a True value in the first column, I want to find the first row in the next rows where the value of second column is True.

So the output should be:

    first   second
1   False   True
4   False   True

Also, it's my priority not to use any for-loops.

Have you any idea about this?

CodePudding user response:

You can use:

g = df['first'].ne(df['first'].shift()).cumsum().loc[~df['first']]

out = df[df['second']].groupby(g).head(1)

Output:

   first  second
1  False    True
4  False    True

Intermediate grouper g:

1    2
3    4
4    4
5    4
7    6
Name: first, dtype: int64

CodePudding user response:

Another way without groupby:

out = (df.loc[df.loc[df.any(axis=1), 'first'].shift(fill_value=False)
         .loc[lambda x: x].index])
print(out)

# Output
   first  second
1  False    True
4  False    True

Note: It only works because there is always one True value from second column between two True values from first column.

CodePudding user response:

Another method:

first_true_idx = df.loc[df['first']].index
second_true_idx = df.loc[df['second']].index
df = df.loc[second_true_idx[list(filter(
     lambda x:x>=0, [(second_true_idx  > e).tolist().index(True) 
                if (second_true_idx > e).any() else -1 for e in first_true_idx]))]]

print(df):

first  second
1  False    True
4  False    True

I believe it should work for whatever position True values are in 'second' Basically I try to look for first greater index in second true index for every index in first true index. This is what you are asking exactly.

  • Related