Assume I have a DataFrame like:
id | length | width |
---|---|---|
2 | 12 | 12 |
2 | 13 | 15 |
4 | 14 | 19 |
4 | 11 | 13 |
7 | 34 | 67 |
7 | 33 | 64 |
7 | 40 | 78 |
7 | 22 | 33 |
What I want is, the number of the id should only show once AND it should only show the row with the minimum of the column of length.
The result would be:
id | length | width |
---|---|---|
2 | 12 | 12 |
4 | 11 | 13 |
7 | 22 | 33 |
CodePudding user response:
Try with
out = df.loc[df.groupby('id')['length'].idxmin()]
Out[220]:
id length width
0 2 12 12
3 4 11 13
7 7 22 33
CodePudding user response:
I believe you updated your request in the comments of other answers. I have provided some code that should allow you to get the results you are expecting
df_pos = df.loc[df.mask(df['length'].ge(0)).dropna().groupby('id')['length'].idxmax()].reset_index().drop('index', axis = 1)
df_neg = df.loc[df.mask(df['length'].lt(0)).dropna().groupby('id')['length'].idxmin()].reset_index().drop('index', axis = 1)
df_con = pd.concat([df_pos, df_neg]).sort_values('id')
df_con