Home > OS >  Python Pandas get maximum with respect to other number
Python Pandas get maximum with respect to other number

Time:04-24

I have a Pandas Dataframe with this table:

name phase value
BOB 1 .9
BOB 2 .05
BOB 3 .05
JOHN 2 .45
JOHN 3 .45
JOHN 4 .05
FRANK 1 .4
FRANK 3 .6

I want to find which phase has the maximum value in column 'value' and if the maximum value is the same then show the first value or random value for phase with maximum value

Desired result table:

name phase value
BOB 1 .9
JOHN 2 .45
FRANK 3 .6

my solution was this:

df.groupby(['name'])[['phase','value']].max() 

but it was returing incorrect values.

EDIT: slightly changed the values to represent the problem more accurately

CodePudding user response:

You don't need to use groupby. Sort values by value and phase (adjust the order if necessary) and drop duplicates by name:

out = (df.sort_values(['value', 'phase'], ascending=[False, True])
         .drop_duplicates('name')
         .sort_index(ignore_index=True))
print(out)

# Output
    name  phase  value
0    BOB      1   0.90
1   JOHN      2   0.45
2  FRANK      3   0.60

CodePudding user response:

Try to sort the dataframe first:

df = df.sort_values(
    by=["name", "value", "phase"], ascending=[True, False, True]
)

x = df.groupby("name", as_index=False).first()
print(x)

Prints:

    name  phase  value
0    BOB      1   0.90
1  FRANK      1   0.60
2   JOHN      1   0.45

CodePudding user response:

A possible solution, that could avoid sorting is with groupby:

df.loc[df.groupby('name', sort = False).value.idxmax()]

    name  phase  value
0    BOB      1   0.90
3   JOHN      2   0.45
7  FRANK      3   0.60

CodePudding user response:

You may check

out = df.sort_values('value',ascending=False).drop_duplicates('name').sort_index()
Out[434]: 
    name  phase  value
0    BOB      1   0.90
3   JOHN      2   0.45
7  FRANK      3   0.60
  • Related