I have a pandas dataframe like this:
first second
0 True False
1 False True
2 True True
3 False False
4 False True
5 False True
6 True False
7 False False
that could be created with the code:
import pandas as pd
df = pd.DataFrame(
{
'first': [True, False, True, False, False, False, True, False],
'second': [False, True, True, False, True, True, False, False]
}
)
For any row having a True
value in the first column, I want to find the first row in the next rows where the value of second column is True
.
So the output should be:
first second
1 False True
4 False True
Also, it's my priority not to use any for-loops.
Have you any idea about this?
CodePudding user response:
You can use:
g = df['first'].ne(df['first'].shift()).cumsum().loc[~df['first']]
out = df[df['second']].groupby(g).head(1)
Output:
first second
1 False True
4 False True
Intermediate grouper g
:
1 2
3 4
4 4
5 4
7 6
Name: first, dtype: int64
CodePudding user response:
Another way without groupby
:
out = (df.loc[df.loc[df.any(axis=1), 'first'].shift(fill_value=False)
.loc[lambda x: x].index])
print(out)
# Output
first second
1 False True
4 False True
Note: It only works because there is always one True value from second
column between two True values from first
column.
CodePudding user response:
Another method:
first_true_idx = df.loc[df['first']].index
second_true_idx = df.loc[df['second']].index
df = df.loc[second_true_idx[list(filter(
lambda x:x>=0, [(second_true_idx > e).tolist().index(True)
if (second_true_idx > e).any() else -1 for e in first_true_idx]))]]
print(df):
first second
1 False True
4 False True
I believe it should work for whatever position True values are in 'second' Basically I try to look for first greater index in second true index for every index in first true index. This is what you are asking exactly.