Home > Net >  How to find percentage after group by in Pandas?
How to find percentage after group by in Pandas?

Time:12-14

I want this to be in a form where I get Kept %, Broken % and F_P2P% for each amount bucket. Example: Desired Output

due_amount_quantiles    Kept %  Broken % F_P2P Unique User Id
(-0.001, 767.91]          5         10     85     100
(767.91, 1477.935]        10        20     70     100

Current Output

due_amount_quantiles    P2P_Status  Unique User Id
0   (-0.001, 767.91]    Broken  60823
1   (-0.001, 767.91]    F_P2P   1324
2   (-0.001, 767.91]    Kept    32458
3   (767.91, 1477.935]  Broken  68638
4   (767.91, 1477.935]  F_P2P   1477
5   (767.91, 1477.935]  Kept    24489
6   (1477.935, 2853.447]    Broken  69048
7   (1477.935, 2853.447]    F_P2P   2106
8   (1477.935, 2853.447]    Kept    23450
9   (2853.447, 25175.06]    Broken  63811

How do I achieve this? unstack did not work.

CodePudding user response:

Use DataFrame.pivot with divide sum values, if duplicates use DataFrame.pivot_table:

#if there are unique due_amount_quantiles, P2P_Status tuples
df = df.pivot('due_amount_quantiles','P2P_Status','Unique User Id')

#if there ara duplicated due_amount_quantiles, P2P_Status tuples is necessary aggregate, here by `sum`
df = df.pivot_table(index='due_amount_quantiles',columns='P2P_Status',values='Unique User Id', aggfunc='sum')

df = df.div(df.sum(axis=1), axis=0).add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status              % Broken   % F_P2P     % Kept  Total
due_amount_quantiles                                        
(-0.001, 767.91]       64.291528  1.399503  34.308969    100
(1477.935, 2853.447]   72.986343  2.226122  24.787535    100
(2853.447, 25175.06]  100.000000       NaN        NaN    100
(767.91, 1477.935]     72.552958  1.561245  25.885798    100

Or:

df = pd.crosstab(index=df['due_amount_quantiles'],
                 columns=df['P2P_Status'],
                 values=df['Unique User Id'], 
                 aggfunc='sum',
                 normalize='index').add_prefix('% ').mul(100).assign(Total = 100)

print (df)
P2P_Status              % Broken   % F_P2P     % Kept  Total
due_amount_quantiles                                        
(-0.001, 767.91]       64.291528  1.399503  34.308969    100
(1477.935, 2853.447]   72.986343  2.226122  24.787535    100
(2853.447, 25175.06]  100.000000  0.000000   0.000000    100
(767.91, 1477.935]     72.552958  1.561245  25.885798    100
  • Related