I have a dataset that looks like this:
Categories | Clicks
1 | 1
1 | 3
1 | 2
2 | 2
2 | 1
2 | 1
2 | 2
3 | 1
3 | 2
3 | 3
4 | 2
4 | 1
And to make some bar plots I would like for it to look like this:
Categories | Clicks_count | Clicks_prob
1 | 1 | 33%
2 | 2 | 50%
3 | 1 | 33%
4 | 1 | 50%
so basically: grouping by Categories
and calculating on Clicks_count
the number of times per category that Clicks
takes the value 1, and Clicks_prob
the probability of Clicks
taking the value 1 (so it's count of Clicks==1/Count of Category i observations)
How could I do this? I tried, to get the second column:
df.groupby("Categories")["Clicks"].count().reset_index()
but the result is:
Categories | Clicks
1 | 3
2 | 4
3 | 3
4 | 2
CodePudding user response:
Try sum
and mean
on the condition Clicks==1
. Since you're working with groups, put them in groupby:
df['Clicks'].eq(1).groupby(df['Categories']).agg(['sum','mean'])
Output:
sum mean
Categories
1 1 0.333333
2 2 0.500000
3 1 0.333333
4 1 0.500000
To match output's naming, use named aggregation:
df['Clicks'].eq(1).groupby(df['Categories']).agg(Click_counts='sum', Clicks_prob='mean')
Output:
Click_counts Clicks_prob
Categories
1 1 0.333333
2 2 0.500000
3 1 0.333333
4 1 0.500000