I have a dataframe and I want to select the rows based on a condition and the previous N rows that are previous the condition.
Example:
pd.DataFrame({'value':[10,20,30,40,50,60,70,80,90],'is_fishing':['NO','NO','YES','NO','YES','NO','NO','NO','YES']})
value is_fishing
0 10 NO
1 20 NO
2 30 YES
3 40 NO
4 50 YES
5 60 NO
6 70 NO
7 80 NO
8 90 YES
Expected with N=1 and condition is_fishing=='YES'
value is_fishing
1 20 NO
2 30 YES
3 40 NO
4 50 YES
7 80 NO
8 90 YES
CodePudding user response:
Numpy's split
def n_prior_to_condition(df, n, condition):
i = np.flatnonzero(condition) 1
return pd.concat([d.tail(n 1) for d in np.split(df, i)])
n_prior_to_condition(df, 1, df.is_fishing=="YES")
value is_fishing
1 20 NO
2 30 YES
3 40 NO
4 50 YES
7 80 NO
8 90 YES
groupby
def n_prior_to_condition(df, n, condition):
groups = condition.iloc[::-1].cumsum()
return df.groupby(groups).tail(n 1)
n_prior_to_condition(df, 1, df.is_fishing=="YES")
value is_fishing
1 20 NO
2 30 YES
3 40 NO
4 50 YES
7 80 NO
8 90 YES