I have a dataframe with multiindexed columns I would like to group by level 0 AND 1. Duplicated columns have values I would like to sum. How can I groupby without dropping the other level ? This is what I have tried but it removes one of the level.
Level 0 is dropped.
data.groupby(level=1, axis=1).sum()
Index(['last', 'quoteVolume'], dtype='object')
Level 1 is dropped.
data.groupby(level=0, axis=1).sum()
Index(['ACA', 'DOT', 'KSM', 'MOVR'], dtype='object')
The columns:
MultiIndex([( 'last', 'DOT'),
('quoteVolume', 'DOT'),
( 'last', 'DOT'),
('quoteVolume', 'DOT'),
( 'last', 'KSM'),
('quoteVolume', 'KSM'),
( 'last', 'KSM'),
('quoteVolume', 'KSM'),
( 'last', 'MOVR'),
('quoteVolume', 'MOVR'),
( 'last', 'MOVR'),
('quoteVolume', 'MOVR'),
( 'last', 'ACA'),
('quoteVolume', 'ACA')],
)
How can I do that ?
The expected output is:
MultiIndex([( 'last', 'DOT'),
('quoteVolume', 'DOT'),
( 'last', 'KSM'),
('quoteVolume', 'KSM'),
( 'last', 'MOVR'),
('quoteVolume', 'MOVR'),
( 'last', 'ACA'),
('quoteVolume', 'ACA')],
)
CodePudding user response:
you can use this, use the same groupby and set columns later using pd.MultiIndex.from_tuples
out = df.groupby(df.columns,axis=1).sum()
out.columns = pd.MultiIndex.from_tuples(out.columns)
print(out)
CodePudding user response:
level
takes a list as parameter, just group by [0, 1]
:
df.groupby(level=[0,1], axis=1).sum().sort_index(axis=1, level=1).columns
MultiIndex([( 'last', 'ACA'),
('quoteVolume', 'ACA'),
( 'last', 'DOT'),
('quoteVolume', 'DOT'),
( 'last', 'KSM'),
('quoteVolume', 'KSM'),
( 'last', 'MOVR'),
('quoteVolume', 'MOVR')],
)