How to groupby multindexed columns with Pandas while keeping the column structure?-CodePudding

I have a dataframe with multiindexed columns I would like to group by level 0 AND 1. Duplicated columns have values I would like to sum. How can I groupby without dropping the other level ? This is what I have tried but it removes one of the level.

Level 0 is dropped.

data.groupby(level=1, axis=1).sum()
Index(['last', 'quoteVolume'], dtype='object')

Level 1 is dropped.

data.groupby(level=0, axis=1).sum()
Index(['ACA', 'DOT', 'KSM', 'MOVR'], dtype='object')

The columns:

MultiIndex([(       'last',  'DOT'),
            ('quoteVolume',  'DOT'),
            (       'last',  'DOT'),
            ('quoteVolume',  'DOT'),
            (       'last',  'KSM'),
            ('quoteVolume',  'KSM'),
            (       'last',  'KSM'),
            ('quoteVolume',  'KSM'),
            (       'last', 'MOVR'),
            ('quoteVolume', 'MOVR'),
            (       'last', 'MOVR'),
            ('quoteVolume', 'MOVR'),
            (       'last',  'ACA'),
            ('quoteVolume',  'ACA')],
           )

How can I do that ?

The expected output is:

MultiIndex([(       'last',  'DOT'),
            ('quoteVolume',  'DOT'),
            (       'last',  'KSM'),
            ('quoteVolume',  'KSM'),
            (       'last', 'MOVR'),
            ('quoteVolume', 'MOVR'),
            (       'last',  'ACA'),
            ('quoteVolume',  'ACA')],
           )

CodePudding user response：

you can use this, use the same groupby and set columns later using pd.MultiIndex.from_tuples

out = df.groupby(df.columns,axis=1).sum()
out.columns = pd.MultiIndex.from_tuples(out.columns)

print(out)

CodePudding user response：

level takes a list as parameter, just group by [0, 1]:

df.groupby(level=[0,1], axis=1).sum().sort_index(axis=1, level=1).columns

MultiIndex([(       'last',  'ACA'),
            ('quoteVolume',  'ACA'),
            (       'last',  'DOT'),
            ('quoteVolume',  'DOT'),
            (       'last',  'KSM'),
            ('quoteVolume',  'KSM'),
            (       'last', 'MOVR'),
            ('quoteVolume', 'MOVR')],
           )