Pandas - calculate within group the maximum value for a minimum within column-CodePudding

I have a large dataframe df. From this dataframe, I want to take within the group defined by its unique values in ['depth','period','direction'] the rows which have the smallest value of 'tension' that gives the maximum value of 'height'. So I want to calculate the minimum amount of tension that gives the maximum height within the specified group. So from the following dataframe df:

depth,height,period,direction,tension,parameter1
8,2.75,4,180,5,16.7
8,3,4,180,10,21.6
8,3,4,180,15,26.6
8,3,4,180,20,31.6
15,2.5,7,45,5,22.8
15,2.5,7,45,10,27.8
15,2.75,7,45,15,32.7
15,3,7,45,20,37.8
22,2.5,7,67.5,5,28.9
22,2.75,7,67.5,10,33.9
22,2.75,7,67.5,15,38.9
22,2.75,7,67.5,20,43.8

I want to return a new dataframe df1:

depth,height,period,direction,tension,parameter1
8,3,4,180,10,21.6
15,3,7,45,20,37.8
22,2.75,7,67.5,10,33.9

Any idea how to do this?

CodePudding user response：

Sort tension and height in ascending and descending order respectively, and then take the first row by groups:

(
  df.sort_values('tension')
    .sort_values('height', ascending=False, kind='mergesort')
    .groupby(['depth', 'period', 'direction'])
    .head(1)
)
   depth  height  period  direction  tension  parameter1
1      8    3.00       4      180.0       10        21.6
7     15    3.00       7       45.0       20        37.8
9     22    2.75       7       67.5       10        33.9

CodePudding user response：

Try:

df = df.sort_values(
    by=["depth", "period", "direction", "height", "tension"],
    ascending=[True, True, True, True, False],
)
df = df.drop_duplicates(subset=["depth", "period", "direction"], keep="last")
print(df)

Prints:

   depth  height  period  direction  tension  parameter1
1      8    3.00       4      180.0       10        21.6
7     15    3.00       7       45.0       20        37.8
9     22    2.75       7       67.5       10        33.9