The objective is calculate subset column average based on the multi condition in a multiindex dataframe.
The first condition is to get the average group by the first level of the multiindex.
The second condition is to get the average based on the dict_ref
value below
dict_ref = dict ( occ=['F2', 'F4'], gr=['Fp1', 'Fpx'] )
For example,for the key occ
, get the average for the F2
and F4
.
The code below should do the work
import re
import numpy as np
import numpy.random
import pandas as pd
numpy.random.seed(0)
dict_ref = dict ( occ=['F2', 'F4'], gr=['Fp1', 'Fpx'] )
_names=['pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr','pow_fr',
'hjor_com','hjor_com','hjor_com','hjor_com']
_idx=['Fp1_band0','Fp1_band1','Fpx_band0','Fpx_band1','F2_band0','F2_band1','F4_band0','F4_band1',
'Fp1','Fpx','F2','F4']
X=np.random.rand(4,len(_names))
columns = pd.MultiIndex.from_arrays([_names, _idx])
df=pd.DataFrame(data=X, columns=columns)
remove_nan =[(e [0], *re.split ( '_', e [1] )) for e in df.columns]
remove_nan = [t ('',) * (len ( max ( remove_nan, key=len ) ) - len ( t )) for t in remove_nan]
df.columns = pd.MultiIndex.from_tuples ( remove_nan )
df = df.T.reset_index ().rename ( columns={"level_0": "group_feature",
"level_1": "ch", "level_2": "feature","level_3": "region"} )
all_df = []
for nref in dict_ref:
df_ch = df [df.ch.isin ( dict_ref [nref] )].groupby (["group_feature", "feature"] ).mean ().reset_index ()
df_ch ['ch'] = nref
all_df.append ( df_ch )
df1 = pd.concat ( [df, *all_df] ).pivot_table ( index=['group_feature', 'ch', 'feature'] ).transpose ()
df1.columns=[(gf[0], f'{gf[1]}' if not gf[-1] else f'{gf[1]}_{gf[-1]}') for gf in df1. columns. values. tolist()]
However, I wonder whether there is a way to avoid the for-loop
all_df = []
for nref in dict_ref:
df_ch = df [df.ch.isin ( dict_ref [nref] )].groupby (["group_feature", "feature"] ).mean ().reset_index ()
df_ch ['ch'] = nref
all_df.append ( df_ch )
Not as important as the above question, but it will be a bonus if there is way to avoid the following lines altogether
remove_nan =[(e [0], *re.split ( '_', e [1] )) for e in df.columns]
remove_nan = [t ('',) * (len ( max ( remove_nan, key=len ) ) - len ( t )) for t in remove_nan]
df.columns = pd.MultiIndex.from_tuples ( remove_nan )
df = df.T.reset_index ().rename ( columns={"level_0": "group_feature",
"level_1": "ch", "level_2": "feature","level_3": "region"} )
Expected Output
(hjor_com, F2) (hjor_com, F4) ... (pow_fr, occ_band0) (pow_fr, occ_band1)
0 0.791725 0.528895 ... 0.430621 0.768834
1 0.461479 0.780529 ... 0.399188 0.851316
2 0.018790 0.617635 ... 0.393202 0.594448
3 0.210383 0.128926 ... 0.528570 0.248629
[4 rows x 18 columns]
CodePudding user response:
You can flip dict_ref
so each item in the value array becomes the key, perform a replacement, and group by the new ch
:
mapping = {
v: key for key, value in dict_ref.items() for v in value
}
all_df = df.replace({"ch": mapping}).groupby(["group_feature", "feature", "ch"]).mean().reset_index()
df1 = pd.concat([df, all_df])...