Home > Software engineering >  Largest Number from a text file using ç
Largest Number from a text file using ç

Time:03-01

Largest Number from a text file using pandas

CodePudding user response:

IIUC,

out = df.groupby('ID')['SNR'].nlargest(5).reset_index('ID')
print(out)

# Output
                   ID  SNR
9   J05062845 7149258  397
8   J05062845 7149258  281
2   J07451689 2804046  257
7   J07451689 2804046  222
1   J07451689 2804046  217
5   J07451689 2804046  206
0   J07451689 2804046  200
13  J15170588 7149258  495
10  J15170588 7149258  431
12  J15170588 7149258  411
11  J15170588 7149258  347
18  J18255915 6533486  403
16  J18255915 6533486  349
19  J18255915 6533486  332
17  J18255915 6533486  321
15  J18255915 6533486  317
22  J19420540 5029382  721
23  J19420540 5029382  350
20  J19420540 5029382  328
21  J19420540 5029382  305

Note: if you want to keep your index ordered, append .sort_index() or sort_index(ignore_index=True) after reset_index('ID').

Update

I also need the deleted lines as a separate output.

Use a boolean mask:

m = df.index.isin(df.groupby('ID')['SNR'].nlargest(5).reset_index('ID').index)
df1 = df.loc[m]   # nlargest(5)
df2 = df.loc[~m]  # excluded rows
  • Related