Home > Software engineering >  How to apply multiple conditions to drop/select specific rows from a dataframe (pandas)?
How to apply multiple conditions to drop/select specific rows from a dataframe (pandas)?

Time:11-12

I have the following dataframe:

    id outcome
0    3      no
1    3      no
2    3      no
3    3     yes
4    3      no
5    5      no
6    5      no
7    5     yes
8    5     yes
9    6      no
10   6      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no
18   6      no
19   7      no
20   7      no
21   7     yes
22   7     yes
23   7      no
24   7      no
25   7      no
26   7      yes

It is grouped based on id and is in ascending order for date.

There are a few conditions I want to satisfy.

I want to remove a current row if the row after it has the same outcome.

If a row is 'yes', then the next row must be the first 'no'.

Additionally, I also want to keep the last 'no' above a 'yes' (so there could be 2 'no' values above a 'yes': basically in a row of no's the first and last 'no's).

This is the desired outcome for the above dataframe:

    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
8    5     yes
10   6      no
15   6     yes
16   6      no
20   7      no
22   7     yes
23   7      no
25   7      no
26   7      yes

At the moment I have created several masks like this:

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})


m1 = df['outcome'] # mask 1 is the outcome column as a dataframe 
m2 = m1.groupby(df['id']).shift(-1)  # grouped by dog_id and shifts values up (negative direction) by 1
m3 = m1.groupby(df['id']).shift().eq('yes')&m1.eq('no') # boolean check

df2 = df[~m1.eq(m2)|m3]
m4 = df2['outcome']
m5 = m4.groupby(df2['id']).shift()
df3 = df2[~m4.eq(m5)]

With the above, however, I do not keep the first and last 'no's above a 'yes'.

CodePudding user response:

You are on the right way for this question

g = df.groupby('id')['outcome']
cond1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
out = df[cond1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]



    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
7    5     yes
10   6      no
11   6     yes
16   6      no
20   7      no
21   7     yes
23   7      no
25   7      no
26   7     yes
  • Related