How to aggregate percent for each group based on target and shape of DataFrame in Python Pandas?-CodePudding

I have DataFrame in Python Pandas like below:

Data type:

GROUP - int
TARGET - int

GROUP TARGET

0-5 1

0-5 0

20-25 1

40-45 1

... ...

GROUP	TARGET
0-5	1
0-5	0
20-25	1
40-45	1
...	...

And I need to make result of the following calculation: df[df["TARGET"]==1].shape[0]] / df.shape[0] for each group. So as a result I need something like below:

GROUP | result  | percent
------|---------|---------
0-5   | 0.005   | 0.5%
5-10  | 0.0093  | 0.93%
10-15 | 0.042   | 4.2%
15-20 | ...     |
20-25 | ...     |
25-30 | ...     |
30-35 | ...     |
35-40 | ...     |
40-45 | ...     |
45-50 | ...     |
50-55 | ...     |
55-60 | ...     |
60-65 | ...     |
65-70 | ...     |
70-75 | ...     |
75-80 | ...     |
80-85 | ...     |
85-90 | ...     |
90-95 | ...     |
95-100| ...     |

How can I do that in Python Pandas ?

CodePudding user response：

If there is only 0,1 use aggregation mean:

df1 = df.groupby("GROUP", as_index=False)["TARGET"].mean()

If possible another values:

df1 = df["TARGET"].eq(1).groupby(df["GROUP"]).mean().reset_index()

CodePudding user response：

First Groupby & get sum & count then take division... Something like this maybe;

df1 = df.groupby("GROUP")["TARGET"].agg({'sum','count'}).reset_index()
df1["result"] = df1["sum"] / df1["count"]
del df1["sum"]
del df1["count"]

Hope this Helps...

CodePudding user response：

I'm making a couple of assumptions on your data source, and that you are trying to calculate a success rate for a series of binomial (1/0) trials.

trials_df = experiment_df.groupby('group', as_index=False)['target'].agg(['sum', 'count'])

will produce something like:

       sum  count
group            
0-5      5     14
11-15    5     10
16-20    5      8
6-10     3      8

You can clean up and get to your success rate like this:

trials_df.reset_index(inplace=True)
trials_df.columns = ['group', 'successes', 'trials']
trials_df['success_rate'] = trials_df['successes'] / trials_df['trials']

   group  successes  trials  success_rate
0    0-5          5      14      0.357143
1  11-15          5      10      0.500000
2  16-20          5       8      0.625000
3   6-10          3       8      0.375000