Pandas - starting iteration index and slicing with .loc-CodePudding

I'm still quite new to Python and programming in general. With luck, I have the right idea, but I can't quite get this to work.

With my example df, I want iteration to start when entry == 1.

import pandas as pd
import numpy as np

nan = np.nan

a = [0,0,4,4,4,4,6,6]
b = [4,4,4,4,4,4,4,4]
entry = [nan,nan,nan,nan,1,nan,nan,nan]

df = pd.DataFrame(columns=['a', 'b', 'entry'])
df = pd.DataFrame.assign(df, a=a, b=b, entry=entry)

I wrote a function, with little success. It returns an error, unhashable type: 'slice'. FWIW, I'm applying this function to groups of various lengths.

def exit_row(df):

    start = df.index[df.entry == 1]

    df.loc[start:,(df.a > df.b), 'exit'] = 1

    return df

Ideally, the result would be as below:

    a   b   entry  exit
0   0   4   NaN   NaN
1   0   4   NaN   NaN 
2   4   4   NaN   NaN 
3   4   4   NaN   NaN 
4   4   4   1.0   NaN 
5   4   4   NaN   NaN 
6   6   4   NaN    1
7   6   4   NaN    1

Any advice much appreciated. I had wondered if I should attempt a For loop instead, though I often find them difficult to read.

CodePudding user response：

You can use boolean indexing:

# what are the rows after entry?
m1 = df['entry'].notna().cummax()
# in which rows is a>b?
m2 = df['a'].gt(df['b'])

# set 1 where both conditions are True
df.loc[m1&m2, 'exit'] = 1

output:

   a  b  entry  exit
0  0  4    NaN   NaN
1  0  4    NaN   NaN
2  4  4    NaN   NaN
3  4  4    NaN   NaN
4  4  4    1.0   NaN
5  4  4    NaN   NaN
6  6  4    NaN   1.0
7  6  4    NaN   1.0

Intermediates:

   a  b  entry  notna     m1     m2  m1&m2  exit
0  0  4    NaN  False  False  False  False   NaN
1  0  4    NaN  False  False  False  False   NaN
2  4  4    NaN  False  False  False  False   NaN
3  4  4    NaN  False  False  False  False   NaN
4  4  4    1.0   True   True  False  False   NaN
5  4  4    NaN  False   True  False  False   NaN
6  6  4    NaN  False   True   True   True   1.0
7  6  4    NaN  False   True   True   True   1.0