Home > Net >  pandas describe() and drop index (flatten column names)
pandas describe() and drop index (flatten column names)

Time:11-09

I made a data frame and then calculated some summary stats with describe, however it still has nested index. How can I drop these?

import pandas as pd
import numpy as np
df_rand = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df_rand = pd.melt(df_rand, value_vars=list('ABCD'))
df_rand_summary = df_rand.groupby('variable').describe().reset_index(drop=True)

this returns

   value                                                 
   count   mean        std  min    25%   50%    75%   max
0  100.0  50.01  27.402534  0.0  28.00  52.5  73.00  99.0
1  100.0  49.85  29.836042  0.0  22.75  54.0  79.25  99.0
2  100.0  46.57  30.491017  0.0  19.75  40.0  76.00  99.0
3  100.0  53.27  28.303855  0.0  28.75  55.5  77.25  99.0

However, I would like

   count   mean        std  min    25%   50%    75%   max
0  100.0  50.01  27.402534  0.0  28.00  52.5  73.00  99.0
1  100.0  49.85  29.836042  0.0  22.75  54.0  79.25  99.0
2  100.0  46.57  30.491017  0.0  19.75  40.0  76.00  99.0
3  100.0  53.27  28.303855  0.0  28.75  55.5  77.25  99.0

or (maybe even better)

  value/count  value/mean   value/std  value/min  value/25%   value/50%   value/75%   value/max
0  100.0        50.01        27.402534  0.0        28.00       52.5        73.00        99.0
1  100.0        49.85        29.836042  0.0        22.75       54.0        79.25        99.0
2  100.0        46.57        30.491017  0.0        19.75       40.0        76.00        99.0
3  100.0        53.27        28.303855  0.0        28.75       55.5        77.25        99.0

CodePudding user response:

df_rand_summary.droplevel(level=0, axis=1)

output:

   count   mean        std  min    25%   50%    75%   max
0  100.0  50.01  27.402534  0.0  28.00  52.5  73.00  99.0
1  100.0  49.85  29.836042  0.0  22.75  54.0  79.25  99.0
2  100.0  46.57  30.491017  0.0  19.75  40.0  76.00  99.0
3  100.0  53.27  28.303855  0.0  28.75  55.5  77.25  99.0



or

df_rand_summary.columns = df_rand_summary.columns.map(lambda x: '/'.join(x))

output(df_rand_summary):

  value/count  value/mean   value/std  value/min  value/25%   value/50%   value/75%   value/max
0  100.0        50.01        27.402534  0.0        28.00       52.5        73.00        99.0
1  100.0        49.85        29.836042  0.0        22.75       54.0        79.25        99.0
2  100.0        46.57        30.491017  0.0        19.75       40.0        76.00        99.0
3  100.0        53.27        28.303855  0.0        28.75       55.5        77.25        99.0
  • Related