Pandas Dataframe select row based on a condition and the previous N rows that are previous the condi-CodePudding

I have a dataframe and I want to select the rows based on a condition and the previous N rows that are previous the condition.

Example:

pd.DataFrame({'value':[10,20,30,40,50,60,70,80,90],'is_fishing':['NO','NO','YES','NO','YES','NO','NO','NO','YES']})

     value     is_fishing
0     10         NO
1     20         NO
2     30        YES
3     40         NO
4     50        YES
5     60         NO
6     70         NO
7     80         NO
8     90        YES

Expected with N=1 and condition is_fishing=='YES'

     value     is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

CodePudding user response：

Numpy's `split`

def n_prior_to_condition(df, n, condition):
    i = np.flatnonzero(condition)   1
    return pd.concat([d.tail(n 1) for d in np.split(df, i)])

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

`groupby`

def n_prior_to_condition(df, n, condition):
    groups = condition.iloc[::-1].cumsum()
    return df.groupby(groups).tail(n 1)

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

Numpy's split

groupby

Numpy's `split`

`groupby`