I have a Pandas dataframe df with column names par1,par2,..,par7. I want to select a subset of the total dataframe. Within the group par1,par2,par3 I want to select the rows for which par4 has the lowest value and par5 has the highest value in that group. So from the dataframe:
Par1,Par2,Par3,Par4,Par5,Par6,Par7
8,7,0,5,1.5,16.66,20.55
8,7,0,10,1.5,21.64,26.32
8,7,0,15,1.5,26.62,32.58
8,7,0,20,1.5,31.62,37.17
8,7,0,5,3,16.66,27.67
8,7,0,10,3,21.64,36.39
8,7,0,15,3,26.62,46.95
8,7,0,20,3,31.62,54.05
I want to return:
Par1,Par2,Par3,Par4,Par5,Par6,Par7
8,7,0,5,3,16.66,27.67
Because here par3 is 5.0 (lower than 10,15,20) and par4 is 3 (higher than 1.5).
CodePudding user response:
Use DataFrame.sort_values
with DataFrame.drop_duplicates
:
df = (df.sort_values(['Par1','Par2','Par3', 'Par4', 'Par5'],
ascending=[True, True, True, True, False])
.drop_duplicates(['Par1','Par2','Par3']))
print (df)
Par1 Par2 Par3 Par4 Par5 Par6 Par7
4 8 7 0 5 3.0 16.66 27.67