I have below df, i want to keep the most recent incidence for the ID. Incident range from 1 - 3. 1 is most recent and 3 is least.
ID | Incident |
---|---|
001 | 1 |
001 | 2 |
001 | 3 |
002 | 1 |
002 | 1 |
002 | 1 |
002 | 1 |
002 | 3 |
A003 | 2 |
A003 | 2 |
A003 | 3 |
Output
ID | Incident |
---|---|
001 | 1 |
002 | 1 |
002 | 1 |
002 | 1 |
002 | 1 |
A003 | 2 |
A003 | 2 |
CodePudding user response:
You can first group your dataset based on your ID, and select the minimum for each group using groupby()
and min()
.
Then you join this result with your original dataframe using merge
and with parameter indicator
set to True
which will show which rows belong to the minimum values.
Last step will be to using loc
to filter your dataframe will get you your answer:
m = df.groupby('ID',as_index=False).agg({'Incident':'min'})
out = df.loc[pd.merge(df,m,how='left',indicator=True)._merge.eq('both')]
prints back:
print(out)
ID Incident
0 1 1
3 2 1
4 2 1
5 2 1
6 2 1
8 A003 2
9 A003 2