Home > Software design >  How to pivot a groupby/agg table but maintaining the stats in columns?
How to pivot a groupby/agg table but maintaining the stats in columns?

Time:05-08

I used the following code to generate a summary stat table

(
    master[(master["decile"]==9) | (master["decile"]==0)]
    [['assets', 'age','diversification', 'decile']]
                          .groupby('decile').agg(['mean', 'median', 'std'])
)

My table looks like the following

               assets                       age                diversification         ... many other variables...
        mean    median   std      mean   median  std       mean   median      std
decile                                  
0      87.035   86.74   6.463   4.551084    3   4.53     0.394525   0.0    1.965721
9     961258.37 174.90  1.959   7.521477    5   6.26     8.008314   2.0    19.625135

I was hoping to get the variables in rows since I have 10 columns.. i don't want my table to become too wide. is there a way to do that?

desired output

                     decile 0                         decile 9
                    mean    median    std           mean     median  std
assets             87.035    86.74   6.463        61258.37 1 74.90   1.959

age               4.551084    3      4.53         7.521477    5       6.26

diversification  0.394525   0.0    1.965721       8.008314   2.0    19.625135

var4

var5

...         

I tried using pivot but it puts the summary stats in one column and it does not look pleasing. I want exactly this format, if possible? thanks!

CodePudding user response:

UPDATED to provide format specified in question:

You can do this:

df = df.stack().T
df.columns = pd.MultiIndex.from_tuples([(f'decile {x[0]}', x[1]) for x in df.columns.to_numpy()], names=[None, None])
print(df)

Sample Input:

# master:
   decile  assets  age  diversification
0       0       1    1                0
1       0       3    3                1
2       0       4   10                3
3       5       5    5                5
4       5       5    5                5
5       5       5    5                5
6       9       6    4                0
7       9       8    4                2
8       9       8    6                2
9       9       9   15               50

# df created using groupby().agg():
          assets                        age                  diversification
            mean median       std      mean median       std            mean median        std
decile
0       2.666667    3.0  1.527525  4.666667    3.0  4.725816        1.333333    1.0   1.527525
9       7.750000    8.0  1.258306  7.250000    5.0  5.251984       13.500000    2.0  24.351591

Output:

                 decile 0                  decile 9
                     mean median       std     mean median        std
assets           2.666667    3.0  1.527525     7.75    8.0   1.258306
age              4.666667    3.0  4.725816     7.25    5.0   5.251984
diversification  1.333333    1.0  1.527525    13.50    2.0  24.351591

CodePudding user response:

Let us do stack and transpose

df.stack().T

decile               0                                  9                     
                     mean median       std              mean     median       std
age              4.551084    3.0      4.53               5.0       6.26  8.008314
assets             87.035  86.74     6.463  961258.37 174.90      1.959  7.521477
diversification  0.394525    0.0  1.965721               2.0  19.625135       NaN
  • Related