I'm trying to get the count of values in a dataframe that satisfies a certain condition. For example, in this dataframe, I want the count of the values in the 'Matched Previous ID' whose values in 'Max Score' is higher than 0.55
Next ID Max Score Matched Previous ID
0 1 0.893201 1
1 2 0.858763 2
2 3 0.902589 3
3 8 0.806605 8
4 15 0.867527 15
5 21 0.536942 21
6 22 0.909944 22
7 28 0.828891 28
8 94 0.223704 4
9 96 0.583676 4
So it should show that there is only one occurance for 4, and 0 occurances for 21, because the values in index 5 and 8 are less than 0.55. How can I do this? I know that df.count_values gives the occurances without any condition, can it be used with conditions as well?
CodePudding user response:
You can try using the apply
method
def counter_(g, threshold=0.55):
num = len(g[g > threshold])
return num
df.groupby('Matched Previous ID')['Max Score'].apply(counter_)
should give you what you want
Matched Previous ID
1.0 1
2.0 1
3.0 1
4.0 1
8.0 1
15.0 1
21.0 0
22.0 1
28.0 1
Name: Max Score, dtype: int64
For completeness if you want to change the threshold
you can use this
df.groupby('Matched Previous ID')['Max Score'].apply(lambda g: counter_(g, threshold=0.2))
Matched Previous ID
1.0 1
2.0 1
3.0 1
4.0 2
8.0 1
15.0 1
21.0 1
22.0 1
28.0 1
Name: Max Score, dtype: int64