I'm still quite new to Python and programming in general. With luck, I have the right idea, but I can't quite get this to work.
With my example df, I want iteration to start when entry == 1
.
import pandas as pd
import numpy as np
nan = np.nan
a = [0,0,4,4,4,4,6,6]
b = [4,4,4,4,4,4,4,4]
entry = [nan,nan,nan,nan,1,nan,nan,nan]
df = pd.DataFrame(columns=['a', 'b', 'entry'])
df = pd.DataFrame.assign(df, a=a, b=b, entry=entry)
I wrote a function, with little success. It returns an error, unhashable type: 'slice'. FWIW, I'm applying this function to groups of various lengths.
def exit_row(df):
start = df.index[df.entry == 1]
df.loc[start:,(df.a > df.b), 'exit'] = 1
return df
Ideally, the result would be as below:
a b entry exit
0 0 4 NaN NaN
1 0 4 NaN NaN
2 4 4 NaN NaN
3 4 4 NaN NaN
4 4 4 1.0 NaN
5 4 4 NaN NaN
6 6 4 NaN 1
7 6 4 NaN 1
Any advice much appreciated. I had wondered if I should attempt a For loop instead, though I often find them difficult to read.
CodePudding user response:
You can use boolean indexing:
# what are the rows after entry?
m1 = df['entry'].notna().cummax()
# in which rows is a>b?
m2 = df['a'].gt(df['b'])
# set 1 where both conditions are True
df.loc[m1&m2, 'exit'] = 1
output:
a b entry exit
0 0 4 NaN NaN
1 0 4 NaN NaN
2 4 4 NaN NaN
3 4 4 NaN NaN
4 4 4 1.0 NaN
5 4 4 NaN NaN
6 6 4 NaN 1.0
7 6 4 NaN 1.0
Intermediates:
a b entry notna m1 m2 m1&m2 exit
0 0 4 NaN False False False False NaN
1 0 4 NaN False False False False NaN
2 4 4 NaN False False False False NaN
3 4 4 NaN False False False False NaN
4 4 4 1.0 True True False False NaN
5 4 4 NaN False True False False NaN
6 6 4 NaN False True True True 1.0
7 6 4 NaN False True True True 1.0