How to create a dataframe shifting rows by negative 1, including times when an above row may not exi-CodePudding

I have this dataframe:

    id outcome
1    3      no
2    3     yes
3    3      no
4    6      no
5    6     yes
6    6      no
7    7      no
8    7     yes
9    7      no
10   7      no
11   7      yes

However, would like to create a new one based on the row above a 'yes' (dismissing anything after).

They must also be grouped by the id column.

This should be the output:

    id outcome
1    3      no
4    6      no
7    7      no
10   7      no

The problem is if there isn't a row above a 'yes', I can't do anything with the following code. I also need to groupby id. Any suggestions?

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})


pre_df = df[df['outcome'].eq('yes').shift(-1)]

CodePudding user response：

np.where is required to solve your problem.

import numpy as np
df['selected_no'] = np.where((df['outcome']=="no")&(df['outcome'].shift(-1)=="yes"),1,0)
>> df[df.selected_no.eq(1)]

      id    outcome selected_no
   2    3   no      1
   6    5   no      1
  10    6   no      1
  19    7   no      1
  22    7   no      1

CodePudding user response：

Try this:

shifted = df.shift(-1)
selected = (
    df['id'].eq(shifted['id'])      # same id as next row
    & df['outcome'].eq('no')        # current row is no
    & shifted['outcome'].eq('yes')  # next row is yes
)
df[selected]

CodePudding user response：

So I went even simpler and tried this:

df[df(['id'])['outcome'].shift(-1).eq('yes')]

and got the same answer as above.