a) How to find the largest five SNRs when the ID and SNRs are repeated? And also I want all these three columns as the output. b) I also want the eliminated lines as the output.
FIT ID SNR
1011563.fit, J16142485-3141000 , 36
1011729.fit, J17210134-3757437 , 18
1011730.fit, J17210134-3757437 , 20
1011731.fit, J17210134-3757437 , 20
1011732.fit, J17210134-3757437 , 13
1011914.fit, J17210134-3757437 , 38
1011915.fit, J17210134-3757437 , 26
1011916.fit, J17210134-3757437 , 19
1011917.fit, J17210134-3757437 , 47
1011918.fit, J17210134-3757437 , 25 ´´´
The result should look somewhat like this.
Expected output for a.
FITS ID SNR
```8 1011917.fit J17210134-3757437 47
5 1011914.fit J17210134-3757437 38
0 1011563.fit J16142485-3141000 36
6 1011915.fit J17210134-3757437 26
9 1011918.fit J17210134-3757437 25
3 1011731.fit J17210134-3757437 20
2 1011730.fit J17210134-3757437 20 ´´´
Expected output for b)
``` FITS ID SNR
1 1011729.fit J17210134-3757437 18
6 1011915.fit J17210134-3757437 26
7 1011916.fit J17210134-3757437 19
4 1011732.fit J17210134-3757437 13´´´
CodePudding user response:
You can use groupby_rank
:
rank = df.groupby('ID')['SNR'].rank(method='dense', ascending=False)
a = df[rank <= 5]
b = df[rank > 5]
Output:
>>> a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
>>> b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19
CodePudding user response:
You can get the min of the largest values per group, then slice:
thresh = df.groupby('ID')['SNR'].nlargest(5).groupby(level=0).min()
m = df['ID'].map(thresh).le(df['SNR'])
a = df[m]
b = df[~m]
output:
# tresh
ID
J16142485-3141000 36
J17210134-3757437 20
Name: SNR, dtype: int64
# a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
# b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19