How to filter Pandas data table using two fields in Python?-CodePudding

I have the following pandas data table :

    File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0   test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1   test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2   test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3   test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4   test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5   test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6   test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7   test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8   test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9   test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0
10  test4.png  BRIDGING    0.133226  1385  1049  1463  1084     100        1645.0

I want to find the records where "T_Area" == 0 or "T_Area" / "Overlap_Area" > 0.5 using groupby in pandas.

Item 10 should be dropped in the output since 100 / 1645 < 0.5

CodePudding user response：

df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]

Output:

   File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

CodePudding user response：

You can use:

m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)

# Output
   File name     River  Confidance     X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

Update

If you want to remove all rows from a group (file name) where at least one row violate the conditions, use groupby_transform:

out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)

# Output
   File name     River  Confidance    X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851  739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243  980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472  966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194  275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075  944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732  929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589  886  1199   963  1248       0           0.0

CodePudding user response：

I hope this will help you:

df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])