Home > Back-end >  Count occurrences of specific value in column based on categories of another column
Count occurrences of specific value in column based on categories of another column

Time:11-04

I have a dataset that looks like this:

Categories | Clicks 
1          |   1
1          |   3
1          |   2
2          |   2 
2          |   1
2          |   1
2          |   2
3          |   1
3          |   2
3          |   3
4          |   2
4          |   1

And to make some bar plots I would like for it to look like this:

Categories | Clicks_count | Clicks_prob
1          |  1           |  33%
2          |  2           |  50%
3          |  1           |  33%
4          |  1           |  50%

so basically: grouping by Categories and calculating on Clicks_count the number of times per category that Clicks takes the value 1, and Clicks_prob the probability of Clicks taking the value 1 (so it's count of Clicks==1/Count of Category i observations)

How could I do this? I tried, to get the second column:

df.groupby("Categories")["Clicks"].count().reset_index()

but the result is:

Categories | Clicks 
1          | 3
2          | 4
3          | 3
4          | 2

CodePudding user response:

Try sum and mean on the condition Clicks==1. Since you're working with groups, put them in groupby:

df['Clicks'].eq(1).groupby(df['Categories']).agg(['sum','mean'])

Output:

            sum      mean
Categories               
1             1  0.333333
2             2  0.500000
3             1  0.333333
4             1  0.500000

To match output's naming, use named aggregation:

df['Clicks'].eq(1).groupby(df['Categories']).agg(Click_counts='sum', Clicks_prob='mean')

Output:

            Click_counts  Clicks_prob
Categories                           
1                      1     0.333333
2                      2     0.500000
3                      1     0.333333
4                      1     0.500000
  • Related