I have the following dataframe:
id outcome
0 3 no
1 3 no
2 3 no
3 3 yes
4 3 no
5 5 no
6 5 no
7 5 yes
8 5 yes
9 6 no
10 6 no
11 6 yes
12 6 yes
13 6 yes
14 6 yes
15 6 yes
16 6 no
17 6 no
18 6 no
19 7 no
20 7 no
21 7 yes
22 7 yes
23 7 no
24 7 no
25 7 no
26 7 yes
It is grouped based on id and is in ascending order for date.
There are a few conditions I want to satisfy.
I want to remove a current row if the row after it has the same outcome.
If a row is 'yes', then the next row must be the first 'no'.
Additionally, I also want to keep the last 'no' above a 'yes' (so there could be 2 'no' values above a 'yes': basically in a row of no's the first and last 'no's).
This is the desired outcome for the above dataframe:
id outcome
2 3 no
3 3 yes
4 3 no
6 5 no
8 5 yes
10 6 no
15 6 yes
16 6 no
20 7 no
22 7 yes
23 7 no
25 7 no
26 7 yes
At the moment I have created several masks like this:
df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7],
'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})
m1 = df['outcome'] # mask 1 is the outcome column as a dataframe
m2 = m1.groupby(df['id']).shift(-1) # grouped by dog_id and shifts values up (negative direction) by 1
m3 = m1.groupby(df['id']).shift().eq('yes')&m1.eq('no') # boolean check
df2 = df[~m1.eq(m2)|m3]
m4 = df2['outcome']
m5 = m4.groupby(df2['id']).shift()
df3 = df2[~m4.eq(m5)]
With the above, however, I do not keep the first and last 'no's above a 'yes'.
CodePudding user response:
You are on the right way for this question
g = df.groupby('id')['outcome']
cond1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
out = df[cond1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]
id outcome
2 3 no
3 3 yes
4 3 no
6 5 no
7 5 yes
10 6 no
11 6 yes
16 6 no
20 7 no
21 7 yes
23 7 no
25 7 no
26 7 yes