I have a large dataframe df. From this dataframe, I want to take within the group defined by its unique values in ['depth','period','direction'] the rows which have the smallest value of 'tension' that gives the maximum value of 'height'. So I want to calculate the minimum amount of tension that gives the maximum height within the specified group. So from the following dataframe df:
depth,height,period,direction,tension,parameter1
8,2.75,4,180,5,16.7
8,3,4,180,10,21.6
8,3,4,180,15,26.6
8,3,4,180,20,31.6
15,2.5,7,45,5,22.8
15,2.5,7,45,10,27.8
15,2.75,7,45,15,32.7
15,3,7,45,20,37.8
22,2.5,7,67.5,5,28.9
22,2.75,7,67.5,10,33.9
22,2.75,7,67.5,15,38.9
22,2.75,7,67.5,20,43.8
I want to return a new dataframe df1:
depth,height,period,direction,tension,parameter1
8,3,4,180,10,21.6
15,3,7,45,20,37.8
22,2.75,7,67.5,10,33.9
Any idea how to do this?
CodePudding user response:
Sort tension
and height
in ascending and descending order respectively, and then take the first row by groups:
(
df.sort_values('tension')
.sort_values('height', ascending=False, kind='mergesort')
.groupby(['depth', 'period', 'direction'])
.head(1)
)
depth height period direction tension parameter1
1 8 3.00 4 180.0 10 21.6
7 15 3.00 7 45.0 20 37.8
9 22 2.75 7 67.5 10 33.9
CodePudding user response:
Try:
df = df.sort_values(
by=["depth", "period", "direction", "height", "tension"],
ascending=[True, True, True, True, False],
)
df = df.drop_duplicates(subset=["depth", "period", "direction"], keep="last")
print(df)
Prints:
depth height period direction tension parameter1
1 8 3.00 4 180.0 10 21.6
7 15 3.00 7 45.0 20 37.8
9 22 2.75 7 67.5 10 33.9