Largest Number from a text file using pandas
CodePudding user response:
IIUC,
out = df.groupby('ID')['SNR'].nlargest(5).reset_index('ID')
print(out)
# Output
ID SNR
9 J05062845 7149258 397
8 J05062845 7149258 281
2 J07451689 2804046 257
7 J07451689 2804046 222
1 J07451689 2804046 217
5 J07451689 2804046 206
0 J07451689 2804046 200
13 J15170588 7149258 495
10 J15170588 7149258 431
12 J15170588 7149258 411
11 J15170588 7149258 347
18 J18255915 6533486 403
16 J18255915 6533486 349
19 J18255915 6533486 332
17 J18255915 6533486 321
15 J18255915 6533486 317
22 J19420540 5029382 721
23 J19420540 5029382 350
20 J19420540 5029382 328
21 J19420540 5029382 305
Note: if you want to keep your index ordered, append .sort_index()
or sort_index(ignore_index=True)
after reset_index('ID')
.
Update
I also need the deleted lines as a separate output.
Use a boolean mask:
m = df.index.isin(df.groupby('ID')['SNR'].nlargest(5).reset_index('ID').index)
df1 = df.loc[m] # nlargest(5)
df2 = df.loc[~m] # excluded rows