Home > Back-end >  How to aggregate percent for each group based on target and shape of DataFrame in Python Pandas?
How to aggregate percent for each group based on target and shape of DataFrame in Python Pandas?

Time:09-28

I have DataFrame in Python Pandas like below:

Data type:

  • GROUP - int

  • TARGET - int

    GROUP TARGET
    0-5 1
    0-5 0
    20-25 1
    40-45 1
    ... ...

And I need to make result of the following calculation: df[df["TARGET"]==1].shape[0]] / df.shape[0] for each group. So as a result I need something like below:

GROUP | result  | percent
------|---------|---------
0-5   | 0.005   | 0.5%
5-10  | 0.0093  | 0.93%
10-15 | 0.042   | 4.2%
15-20 | ...     |
20-25 | ...     |
25-30 | ...     |
30-35 | ...     |
35-40 | ...     |
40-45 | ...     |
45-50 | ...     |
50-55 | ...     |
55-60 | ...     |
60-65 | ...     |
65-70 | ...     |
70-75 | ...     |
75-80 | ...     |
80-85 | ...     |
85-90 | ...     |
90-95 | ...     |
95-100| ...     |

How can I do that in Python Pandas ?

CodePudding user response:

If there is only 0,1 use aggregation mean:

df1 = df.groupby("GROUP", as_index=False)["TARGET"].mean()

If possible another values:

df1 = df["TARGET"].eq(1).groupby(df["GROUP"]).mean().reset_index()

CodePudding user response:

First Groupby & get sum & count then take division... Something like this maybe;

df1 = df.groupby("GROUP")["TARGET"].agg({'sum','count'}).reset_index()
df1["result"] = df1["sum"] / df1["count"]
del df1["sum"]
del df1["count"]

Hope this Helps...

CodePudding user response:

I'm making a couple of assumptions on your data source, and that you are trying to calculate a success rate for a series of binomial (1/0) trials.

trials_df = experiment_df.groupby('group', as_index=False)['target'].agg(['sum', 'count'])

will produce something like:

       sum  count
group            
0-5      5     14
11-15    5     10
16-20    5      8
6-10     3      8

You can clean up and get to your success rate like this:

trials_df.reset_index(inplace=True)
trials_df.columns = ['group', 'successes', 'trials']
trials_df['success_rate'] = trials_df['successes'] / trials_df['trials']

   group  successes  trials  success_rate
0    0-5          5      14      0.357143
1  11-15          5      10      0.500000
2  16-20          5       8      0.625000
3   6-10          3       8      0.375000
  • Related