Home > database >  Pandas - df.values_counts() with condition?
Pandas - df.values_counts() with condition?

Time:10-13

I'm trying to get the count of values in a dataframe that satisfies a certain condition. For example, in this dataframe, I want the count of the values in the 'Matched Previous ID' whose values in 'Max Score' is higher than 0.55

    Next ID  Max Score          Matched Previous ID
0        1   0.893201                    1
1        2   0.858763                    2
2        3   0.902589                    3
3        8   0.806605                    8
4       15   0.867527                   15
5       21   0.536942                   21
6       22   0.909944                   22
7       28   0.828891                   28
8       94   0.223704                    4
9       96   0.583676                    4

So it should show that there is only one occurance for 4, and 0 occurances for 21, because the values in index 5 and 8 are less than 0.55. How can I do this? I know that df.count_values gives the occurances without any condition, can it be used with conditions as well?

CodePudding user response:

You can try using the apply method

def counter_(g, threshold=0.55):
    num = len(g[g > threshold])
    return num

df.groupby('Matched Previous ID')['Max Score'].apply(counter_)

should give you what you want

Matched Previous ID
1.0     1
2.0     1
3.0     1
4.0     1
8.0     1
15.0    1
21.0    0
22.0    1
28.0    1
Name: Max Score, dtype: int64

For completeness if you want to change the threshold you can use this

df.groupby('Matched Previous ID')['Max Score'].apply(lambda g: counter_(g, threshold=0.2))
Matched Previous ID
1.0     1
2.0     1
3.0     1
4.0     2
8.0     1
15.0    1
21.0    1
22.0    1
28.0    1
Name: Max Score, dtype: int64
  • Related