Home > Enterprise >  Pandas getting the mean of columns in multi-index dataframe
Pandas getting the mean of columns in multi-index dataframe

Time:12-25

I have a pandas multi index dataframe like shown below.

                   M                            EM                             A            ...        EA                  M0                           EM0                    
Component         EN        EZ        NZ        EN        EZ        NZ        EN        EZ  ...        EZ        NZ        EN        EZ        NZ        EN        EZ        NZ
Date                                                                                        ...                                                                                
2020-07-15  0.001682  0.000963  0.001292  0.000737  0.000635  0.000907 -0.048716  0.022769  ...  0.013103  0.016042  0.003619  0.001009  0.001718  0.000829  0.000685  0.000880
2020-07-16  0.000198  0.000178 -0.001219  0.000586  0.000548  0.000691  0.014514  0.009234  ...  0.010467  0.013240  0.000271  0.000238 -0.001365  0.000592  0.000541  0.000750
2020-07-17  0.000810 -0.000322 -0.000682  0.000445  0.000654  0.000595  0.045604 -0.014437  ...  0.014070  0.011086  0.000966 -0.000665 -0.000886  0.000680  0.000564  0.000584
2020-07-18  0.000287 -0.000887 -0.000329  0.000631  0.000815  0.000534  0.038145 -0.001408  ...  0.016091  0.010116 -0.000147 -0.000928 -0.000342  0.000760  0.000654  0.000519
2020-07-19  0.000805  0.000673 -0.001189  0.000537  0.000513  0.000462  0.051083  0.026574  ...  0.010684  0.009200  0.000746  0.001316 -0.001138  0.000896  0.000493  0.000470

I need to calculate the mean of EN, EZ, and NZ columns under M and EM to new columns (e.g. Mean) (also under M and EM, respectively), like shown below.

                  M                                         EM                                                   
Component         EN        EZ        NZ       Mean         EN        EZ        NZ        Mean        
Date                                                                                                                                                          
2020-07-15  0.001682  0.000963  0.001292   x.YYYYYY   0.000737  0.000635  0.000907    y.ZZZZZZ  

Can someone show me a workaround for this?

Thanks in advance!

CodePudding user response:

Try:

df_mean = pd.concat({'mean': df.groupby(level=0, axis=1).mean()}, axis=1).swaplevel(axis=1)
df = df.join(df_mean).sort_index(level=0, axis=1)
print(df)

# Output:
            EM                 M              
Component   EN  EZ  NZ  mean  EN  EZ  NZ  mean
Date                                          
2020-07-15   4   5   6   5.0   1   2   3   2.0
2020-07-16  14  15  16  15.0  11  12  13  12.0

Setup to be reproducible:

import io

s = """\
,M,M,M,EM,EM,EM
Component,EN,EZ,NZ,EN,EZ,NZ
Date,,,,,,
2020-07-15,1,2,3,4,5,6
2020-07-16,11,12,13,14,15,16
"""
df = pd.read_csv(io.StringIO(s), header=[0, 1], index_col=0)
  • Related