I have the following dataframe:
id date outcome
3 03/05/2019 no
3 29/05/2019 no
3 04/09/2019 no
3 30/10/2019 yes
3 03/05/2020 no
5 03/12/2019 no
5 26/12/2019 no
5 27/01/2020 yes
5 03/06/2020 yes
6 04/05/2019 no
6 27/10/2019 no
6 26/11/2019 yes
6 28/11/2019 yes
6 29/11/2019 yes
6 20/12/2019 yes
6 27/12/2019 yes
6 29/12/2019 yes
6 03/01/2020 yes
6 14/01/2020 yes
6 11/02/2020 yes
6 13/02/2020 yes
6 18/02/2020 yes
6 13/04/2020 yes
6 14/04/2020 yes
6 24/04/2020 yes
6 30/04/2020 yes
6 05/05/2020 no
where it is grouped based on id and in ascending order for date.
I want to remove a row if the row after it has the same outcome. This is the desired outcome for id 3:
id date outcome
3 04/09/2019 no
3 30/10/2019 yes
3 03/05/2020 no
At the moment I have created a mask like this:
m1 = (df['alerts'] == df['alerts'].shift(-1))
but unsure of what I need to do next to apply the condition and delete the specified rows. I'm assuming I will be doing a groupby with the id column...
CodePudding user response:
Does this solve your problem?
new_df = pd.DataFrame()
current = "yes"
for i, row in df.iterrows():
if row['outcome'] != current:
current = row['outcome']
new_df = new_df.append(row)
CodePudding user response:
Let us do
m1 = (df['outcome'] !=
df['outcome'].shift()).cumsum()
out = df.groupby([df['id'],m1]).head(1)
id date outcome
0 3 03/05/2019 no
3 3 30/10/2019 yes
4 3 03/05/2020 no
5 5 03/12/2019 no
7 5 27/01/2020 yes
9 6 04/05/2019 no
11 6 26/11/2019 yes
26 6 05/05/2020 no