I made a data frame and then calculated some summary stats with describe
, however it still has nested index. How can I drop these?
import pandas as pd
import numpy as np
df_rand = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df_rand = pd.melt(df_rand, value_vars=list('ABCD'))
df_rand_summary = df_rand.groupby('variable').describe().reset_index(drop=True)
this returns
value
count mean std min 25% 50% 75% max
0 100.0 50.01 27.402534 0.0 28.00 52.5 73.00 99.0
1 100.0 49.85 29.836042 0.0 22.75 54.0 79.25 99.0
2 100.0 46.57 30.491017 0.0 19.75 40.0 76.00 99.0
3 100.0 53.27 28.303855 0.0 28.75 55.5 77.25 99.0
However, I would like
count mean std min 25% 50% 75% max
0 100.0 50.01 27.402534 0.0 28.00 52.5 73.00 99.0
1 100.0 49.85 29.836042 0.0 22.75 54.0 79.25 99.0
2 100.0 46.57 30.491017 0.0 19.75 40.0 76.00 99.0
3 100.0 53.27 28.303855 0.0 28.75 55.5 77.25 99.0
or (maybe even better)
value/count value/mean value/std value/min value/25% value/50% value/75% value/max
0 100.0 50.01 27.402534 0.0 28.00 52.5 73.00 99.0
1 100.0 49.85 29.836042 0.0 22.75 54.0 79.25 99.0
2 100.0 46.57 30.491017 0.0 19.75 40.0 76.00 99.0
3 100.0 53.27 28.303855 0.0 28.75 55.5 77.25 99.0
CodePudding user response:
df_rand_summary.droplevel(level=0, axis=1)
output:
count mean std min 25% 50% 75% max
0 100.0 50.01 27.402534 0.0 28.00 52.5 73.00 99.0
1 100.0 49.85 29.836042 0.0 22.75 54.0 79.25 99.0
2 100.0 46.57 30.491017 0.0 19.75 40.0 76.00 99.0
3 100.0 53.27 28.303855 0.0 28.75 55.5 77.25 99.0
or
df_rand_summary.columns = df_rand_summary.columns.map(lambda x: '/'.join(x))
output(df_rand_summary
):
value/count value/mean value/std value/min value/25% value/50% value/75% value/max
0 100.0 50.01 27.402534 0.0 28.00 52.5 73.00 99.0
1 100.0 49.85 29.836042 0.0 22.75 54.0 79.25 99.0
2 100.0 46.57 30.491017 0.0 19.75 40.0 76.00 99.0
3 100.0 53.27 28.303855 0.0 28.75 55.5 77.25 99.0