Home > Enterprise >  Pandas Dataframe select row based on a condition and the previous N rows that are previous the condi
Pandas Dataframe select row based on a condition and the previous N rows that are previous the condi

Time:04-05

I have a dataframe and I want to select the rows based on a condition and the previous N rows that are previous the condition.

Example:

pd.DataFrame({'value':[10,20,30,40,50,60,70,80,90],'is_fishing':['NO','NO','YES','NO','YES','NO','NO','NO','YES']})

     value     is_fishing
0     10         NO
1     20         NO
2     30        YES
3     40         NO
4     50        YES
5     60         NO
6     70         NO
7     80         NO
8     90        YES

Expected with N=1 and condition is_fishing=='YES'

     value     is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

CodePudding user response:

Numpy's split

def n_prior_to_condition(df, n, condition):
    i = np.flatnonzero(condition)   1
    return pd.concat([d.tail(n 1) for d in np.split(df, i)])

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

groupby

def n_prior_to_condition(df, n, condition):
    groups = condition.iloc[::-1].cumsum()
    return df.groupby(groups).tail(n 1)

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES
​
  • Related