I have this dataframe:
id outcome
1 3 no
2 3 yes
3 3 no
4 6 no
5 6 yes
6 6 no
7 7 no
8 7 yes
9 7 no
10 7 no
11 7 yes
However, would like to create a new one based on the row above a 'yes' (dismissing anything after).
They must also be grouped by the id column.
This should be the output:
id outcome
1 3 no
4 6 no
7 7 no
10 7 no
The problem is if there isn't a row above a 'yes', I can't do anything with the following code. I also need to groupby id. Any suggestions?
df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7],
'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})
pre_df = df[df['outcome'].eq('yes').shift(-1)]
CodePudding user response:
np.where
is required to solve your problem.
import numpy as np
df['selected_no'] = np.where((df['outcome']=="no")&(df['outcome'].shift(-1)=="yes"),1,0)
>> df[df.selected_no.eq(1)]
id outcome selected_no
2 3 no 1
6 5 no 1
10 6 no 1
19 7 no 1
22 7 no 1
CodePudding user response:
Try this:
shifted = df.shift(-1)
selected = (
df['id'].eq(shifted['id']) # same id as next row
& df['outcome'].eq('no') # current row is no
& shifted['outcome'].eq('yes') # next row is yes
)
df[selected]
CodePudding user response:
So I went even simpler and tried this:
df[df(['id'])['outcome'].shift(-1).eq('yes')]
and got the same answer as above.