Home > Back-end >  How to remove rows based on next value in a sequence? (pandas)
How to remove rows based on next value in a sequence? (pandas)

Time:10-29

I have the following dataframe:

id  date       outcome
3   03/05/2019  no
3   29/05/2019  no
3   04/09/2019  no
3   30/10/2019  yes
3   03/05/2020  no
5   03/12/2019  no
5   26/12/2019  no
5   27/01/2020  yes
5   03/06/2020  yes
6   04/05/2019  no
6   27/10/2019  no
6   26/11/2019  yes
6   28/11/2019  yes
6   29/11/2019  yes
6   20/12/2019  yes
6   27/12/2019  yes
6   29/12/2019  yes
6   03/01/2020  yes
6   14/01/2020  yes
6   11/02/2020  yes
6   13/02/2020  yes
6   18/02/2020  yes
6   13/04/2020  yes
6   14/04/2020  yes
6   24/04/2020  yes
6   30/04/2020  yes
6   05/05/2020  no

where it is grouped based on id and in ascending order for date.

I want to remove a row if the row after it has the same outcome. This is the desired outcome for id 3:

id  date       outcome
3   04/09/2019  no
3   30/10/2019  yes
3   03/05/2020  no

At the moment I have created a mask like this:

m1 = (df['alerts'] == df['alerts'].shift(-1))

but unsure of what I need to do next to apply the condition and delete the specified rows. I'm assuming I will be doing a groupby with the id column...

CodePudding user response:

Does this solve your problem?

new_df = pd.DataFrame()
current = "yes"
for i, row in df.iterrows():
    if row['outcome'] != current:
        current = row['outcome']
        new_df = new_df.append(row)

CodePudding user response:

Let us do

m1 = (df['outcome'] !=
      df['outcome'].shift()).cumsum()
out = df.groupby([df['id'],m1]).head(1)
    id        date outcome
0    3  03/05/2019      no
3    3  30/10/2019     yes
4    3  03/05/2020      no
5    5  03/12/2019      no
7    5  27/01/2020     yes
9    6  04/05/2019      no
11   6  26/11/2019     yes
26   6  05/05/2020      no
  • Related