Return unique row values from a pandas dataframe based on some conditions [duplicate]-CodePudding

i have this problem i have been trying to solve myself but stuck in between, so i brought it here to seek your help and i look forward to it.

I have a pandas dataframe as below:

            x1          y1          x2          y2    confidence   class
0   238.288834  118.716125  300.878754  137.672791    0.885205    0.0
1   238.288834  118.716125  300.878754  137.672791    0.881469    1.0
2   238.288834  118.716125  300.878754  137.672791    0.879645    5.0
3   248.977844  115.054123  321.307007  141.315460    0.876451    0.0
4   248.977844  115.054123  321.307007  141.315460    0.872008    1.0
5  15.0           10.0     2298.9      187.0         0.70      0.0

I would like to return rows which has a unique x1,y1,x2 and y2 values with the highest confidence value.

Explanation: From the dataframe above row 0,1 and 2 has the same x1,y1,x2 and y2 values but i would like to return row 0 since it is the unique one which has the highest value of the confidence interval which is (0.885205)

The expected outcome would look like:

            x1          y1          x2          y2    confidence   class
0   238.288834  118.716125  300.878754  137.672791    0.885205    0.0
1   248.977844  115.054123  321.307007  141.315460    0.876451    0.0
2.  15.0           10.0     2298.9      187.0.         0.70      0.0

CodePudding user response：

You can try:

df.groupby(['x1','y1','x2', "y2"], as_index=False, sort=False)['confidence'].max()

Result:

           x1          y1           x2          y2  confidence
0  238.288834  118.716125   300.878754  137.672791    0.885205
1  248.977844  115.054123   321.307007  141.315460    0.876451
2   15.000000   10.000000  2298.900000  187.000000    0.700000

Or, if you wan to show all columns, use idxmax() .loc

df.loc[df.groupby(['x1','y1','x2', "y2"], sort=False)['confidence'].idxmax()]

Result:

           x1          y1           x2          y2  confidence  class
0  238.288834  118.716125   300.878754  137.672791    0.885205    0.0
3  248.977844  115.054123   321.307007  141.315460    0.876451    0.0
5   15.000000   10.000000  2298.900000  187.000000    0.700000    0.0

CodePudding user response：

filter and drop_duplicates

df[['x1', 'y1', 'x2', 'y2']].drop_duplicates(subset=['type'])

#lambda with filter
# your condition after lambda

df1 =list(filter(lambda x: x>18,df['x1']))