I have a Pandas Dataframe with this table:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
BOB | 2 | .05 |
BOB | 3 | .05 |
JOHN | 2 | .45 |
JOHN | 3 | .45 |
JOHN | 4 | .05 |
FRANK | 1 | .4 |
FRANK | 3 | .6 |
I want to find which phase has the maximum value in column 'value' and if the maximum value is the same then show the first value or random value for phase with maximum value
Desired result table:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
JOHN | 2 | .45 |
FRANK | 3 | .6 |
my solution was this:
df.groupby(['name'])[['phase','value']].max()
but it was returing incorrect values.
EDIT: slightly changed the values to represent the problem more accurately
CodePudding user response:
You don't need to use groupby
. Sort values by value
and phase
(adjust the order if necessary) and drop duplicates by name
:
out = (df.sort_values(['value', 'phase'], ascending=[False, True])
.drop_duplicates('name')
.sort_index(ignore_index=True))
print(out)
# Output
name phase value
0 BOB 1 0.90
1 JOHN 2 0.45
2 FRANK 3 0.60
CodePudding user response:
Try to sort the dataframe first:
df = df.sort_values(
by=["name", "value", "phase"], ascending=[True, False, True]
)
x = df.groupby("name", as_index=False).first()
print(x)
Prints:
name phase value
0 BOB 1 0.90
1 FRANK 1 0.60
2 JOHN 1 0.45
CodePudding user response:
A possible solution, that could avoid sorting is with groupby:
df.loc[df.groupby('name', sort = False).value.idxmax()]
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60
CodePudding user response:
You may check
out = df.sort_values('value',ascending=False).drop_duplicates('name').sort_index()
Out[434]:
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60