I'm importing an Excel file which has the following structure:
| | Cat 1 | | | | Cat 2 | | | Total |
|code| a | b | c | d | a | b | c | |
|data| data |data|data|data| data |data|data| data |
I want to keep the information in the double header row so I use:
df = pd.read_excel(file, sheet, header=[0,1]
But this gives me the following MultiIndex: print(df.columns)
:
MultiIndex([('Unnamed: 0_level_0', 'code'),
( 'Cat 1', 'a'),
( 'Cat 1', 'b'),
etc.
( 'Cat 2', 'a'),
( 'Cat 2', 'b'),
etc.
( 'Total','Unnamed: 8_level_1')],)
I'm looking for a way to collapse the unnamed: x_level_y
columns so I can access them simply with df['code']
or df['Total']
. I've tried df.rename(columns={'Unnamed: 0_level_0: ''})
, but this isn't generalisable if I don't know which levels are missing, and doesn't allow me to access the column with just the single layered name. The other answers I've found are about removing any columns which contain Unnamed
in the column name, but I want to keep the columns and the data they contain.
CodePudding user response:
You can re-create the MultiIndex and put the existing name in level 0 for all columns where any level contains Unnamed
:
df.columns = pd.MultiIndex.from_tuples(
[(c[1],'') if 'Unnamed' in c[0] else
(c[0],'') if 'Unnamed' in c[1] else
c
for c in df.columns.to_list()])