I have a pandas multi index dataframe like shown below.
M EM A ... EA M0 EM0
Component EN EZ NZ EN EZ NZ EN EZ ... EZ NZ EN EZ NZ EN EZ NZ
Date ...
2020-07-15 0.001682 0.000963 0.001292 0.000737 0.000635 0.000907 -0.048716 0.022769 ... 0.013103 0.016042 0.003619 0.001009 0.001718 0.000829 0.000685 0.000880
2020-07-16 0.000198 0.000178 -0.001219 0.000586 0.000548 0.000691 0.014514 0.009234 ... 0.010467 0.013240 0.000271 0.000238 -0.001365 0.000592 0.000541 0.000750
2020-07-17 0.000810 -0.000322 -0.000682 0.000445 0.000654 0.000595 0.045604 -0.014437 ... 0.014070 0.011086 0.000966 -0.000665 -0.000886 0.000680 0.000564 0.000584
2020-07-18 0.000287 -0.000887 -0.000329 0.000631 0.000815 0.000534 0.038145 -0.001408 ... 0.016091 0.010116 -0.000147 -0.000928 -0.000342 0.000760 0.000654 0.000519
2020-07-19 0.000805 0.000673 -0.001189 0.000537 0.000513 0.000462 0.051083 0.026574 ... 0.010684 0.009200 0.000746 0.001316 -0.001138 0.000896 0.000493 0.000470
I need to calculate the mean of EN
, EZ
, and NZ
columns under M
and EM
to new columns (e.g. Mean
) (also under M
and EM
, respectively), like shown below.
M EM
Component EN EZ NZ Mean EN EZ NZ Mean
Date
2020-07-15 0.001682 0.000963 0.001292 x.YYYYYY 0.000737 0.000635 0.000907 y.ZZZZZZ
Can someone show me a workaround for this?
Thanks in advance!
CodePudding user response:
Try:
df_mean = pd.concat({'mean': df.groupby(level=0, axis=1).mean()}, axis=1).swaplevel(axis=1)
df = df.join(df_mean).sort_index(level=0, axis=1)
print(df)
# Output:
EM M
Component EN EZ NZ mean EN EZ NZ mean
Date
2020-07-15 4 5 6 5.0 1 2 3 2.0
2020-07-16 14 15 16 15.0 11 12 13 12.0
Setup to be reproducible:
import io
s = """\
,M,M,M,EM,EM,EM
Component,EN,EZ,NZ,EN,EZ,NZ
Date,,,,,,
2020-07-15,1,2,3,4,5,6
2020-07-16,11,12,13,14,15,16
"""
df = pd.read_csv(io.StringIO(s), header=[0, 1], index_col=0)