I have the following pandas data table :
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
10 test4.png BRIDGING 0.133226 1385 1049 1463 1084 100 1645.0
I want to find the records where "T_Area" == 0
or "T_Area" / "Overlap_Area" > 0.5
using groupby
in pandas.
Item 10 should be dropped in the output since 100 / 1645 < 0.5
CodePudding user response:
df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]
Output:
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
CodePudding user response:
You can use:
m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
Update
If you want to remove all rows from a group (file name) where at least one row violate the conditions, use groupby_transform
:
out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
CodePudding user response:
I hope this will help you:
df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])