I have DataFrame in Python Pandas like below:
Data type:
GROUP - int
TARGET - int
GROUP TARGET 0-5 1 0-5 0 20-25 1 40-45 1 ... ...
And I need to make result of the following calculation: df[df["TARGET"]==1].shape[0]] / df.shape[0]
for each group.
So as a result I need something like below:
GROUP | result | percent
------|---------|---------
0-5 | 0.005 | 0.5%
5-10 | 0.0093 | 0.93%
10-15 | 0.042 | 4.2%
15-20 | ... |
20-25 | ... |
25-30 | ... |
30-35 | ... |
35-40 | ... |
40-45 | ... |
45-50 | ... |
50-55 | ... |
55-60 | ... |
60-65 | ... |
65-70 | ... |
70-75 | ... |
75-80 | ... |
80-85 | ... |
85-90 | ... |
90-95 | ... |
95-100| ... |
How can I do that in Python Pandas ?
CodePudding user response:
If there is only 0,1
use aggregation mean
:
df1 = df.groupby("GROUP", as_index=False)["TARGET"].mean()
If possible another values:
df1 = df["TARGET"].eq(1).groupby(df["GROUP"]).mean().reset_index()
CodePudding user response:
First Groupby & get sum & count then take division... Something like this maybe;
df1 = df.groupby("GROUP")["TARGET"].agg({'sum','count'}).reset_index()
df1["result"] = df1["sum"] / df1["count"]
del df1["sum"]
del df1["count"]
Hope this Helps...
CodePudding user response:
I'm making a couple of assumptions on your data source, and that you are trying to calculate a success rate for a series of binomial (1/0) trials.
trials_df = experiment_df.groupby('group', as_index=False)['target'].agg(['sum', 'count'])
will produce something like:
sum count
group
0-5 5 14
11-15 5 10
16-20 5 8
6-10 3 8
You can clean up and get to your success rate like this:
trials_df.reset_index(inplace=True)
trials_df.columns = ['group', 'successes', 'trials']
trials_df['success_rate'] = trials_df['successes'] / trials_df['trials']
group successes trials success_rate
0 0-5 5 14 0.357143
1 11-15 5 10 0.500000
2 16-20 5 8 0.625000
3 6-10 3 8 0.375000