i have this problem i have been trying to solve myself but stuck in between, so i brought it here to seek your help and i look forward to it.
I have a pandas dataframe as below:
x1 y1 x2 y2 confidence class
0 238.288834 118.716125 300.878754 137.672791 0.885205 0.0
1 238.288834 118.716125 300.878754 137.672791 0.881469 1.0
2 238.288834 118.716125 300.878754 137.672791 0.879645 5.0
3 248.977844 115.054123 321.307007 141.315460 0.876451 0.0
4 248.977844 115.054123 321.307007 141.315460 0.872008 1.0
5 15.0 10.0 2298.9 187.0 0.70 0.0
I would like to return rows which has a unique x1,y1,x2 and y2 values with the highest confidence value.
Explanation: From the dataframe above row 0,1 and 2 has the same x1,y1,x2 and y2 values but i would like to return row 0 since it is the unique one which has the highest value of the confidence interval which is (0.885205)
The expected outcome would look like:
x1 y1 x2 y2 confidence class
0 238.288834 118.716125 300.878754 137.672791 0.885205 0.0
1 248.977844 115.054123 321.307007 141.315460 0.876451 0.0
2. 15.0 10.0 2298.9 187.0. 0.70 0.0
CodePudding user response:
You can try:
df.groupby(['x1','y1','x2', "y2"], as_index=False, sort=False)['confidence'].max()
Result:
x1 y1 x2 y2 confidence
0 238.288834 118.716125 300.878754 137.672791 0.885205
1 248.977844 115.054123 321.307007 141.315460 0.876451
2 15.000000 10.000000 2298.900000 187.000000 0.700000
Or, if you wan to show all columns, use idxmax()
.loc
df.loc[df.groupby(['x1','y1','x2', "y2"], sort=False)['confidence'].idxmax()]
Result:
x1 y1 x2 y2 confidence class
0 238.288834 118.716125 300.878754 137.672791 0.885205 0.0
3 248.977844 115.054123 321.307007 141.315460 0.876451 0.0
5 15.000000 10.000000 2298.900000 187.000000 0.700000 0.0
CodePudding user response:
df[['x1', 'y1', 'x2', 'y2']].drop_duplicates(subset=['type'])
#lambda with filter
# your condition after lambda
df1 =list(filter(lambda x: x>18,df['x1']))