I want this to be in a form where I get Kept %, Broken % and F_P2P% for each amount bucket. Example: Desired Output
due_amount_quantiles Kept % Broken % F_P2P Unique User Id
(-0.001, 767.91] 5 10 85 100
(767.91, 1477.935] 10 20 70 100
Current Output
due_amount_quantiles P2P_Status Unique User Id
0 (-0.001, 767.91] Broken 60823
1 (-0.001, 767.91] F_P2P 1324
2 (-0.001, 767.91] Kept 32458
3 (767.91, 1477.935] Broken 68638
4 (767.91, 1477.935] F_P2P 1477
5 (767.91, 1477.935] Kept 24489
6 (1477.935, 2853.447] Broken 69048
7 (1477.935, 2853.447] F_P2P 2106
8 (1477.935, 2853.447] Kept 23450
9 (2853.447, 25175.06] Broken 63811
How do I achieve this? unstack did not work.
CodePudding user response:
Use DataFrame.pivot
with divide sum values, if duplicates use DataFrame.pivot_table
:
#if there are unique due_amount_quantiles, P2P_Status tuples
df = df.pivot('due_amount_quantiles','P2P_Status','Unique User Id')
#if there ara duplicated due_amount_quantiles, P2P_Status tuples is necessary aggregate, here by `sum`
df = df.pivot_table(index='due_amount_quantiles',columns='P2P_Status',values='Unique User Id', aggfunc='sum')
df = df.div(df.sum(axis=1), axis=0).add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 NaN NaN 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100
Or:
df = pd.crosstab(index=df['due_amount_quantiles'],
columns=df['P2P_Status'],
values=df['Unique User Id'],
aggfunc='sum',
normalize='index').add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 0.000000 0.000000 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100