Home > OS >  Remove duplicated columns for multi-level headers in Pandas
Remove duplicated columns for multi-level headers in Pandas

Time:12-24

I read a excel file data with df = pd.read_excel('data.xlsx', header=[0, 1], sheet_name='Sheet1'):

         name cpi icpi CPI
         freq M D M
0 2021-02-21 -9.8 31.524 9.806
1 2021-02-22 -5.6 30.777 9.164
2 2021-02-23 3.5 29.318 7.841
3 2021-02-24 -1.1 29.209 7.570
4 2021-02-25 -2.7 29.074 7.467

I hope that the columns with the same name and freq in the 2-layers headers can be regarded as duplicated data, and these columns can be deleted. How can I do this?

print(df.columns.get_level_values(0))
print(df.columns.to_flat_index())
Index(['name', 'cpi', 'icpi', 'CPI'], dtype='object')
Index([('name', 'freq'), ('cpi', 'M'), ('icpi', 'D'), ('CPI', 'M')], dtype='object')

The expected result:

        name  cpi    icpi
        freq    M       D
0 2021-02-21 -9.8  31.524
1 2021-02-22 -5.6  30.777
2 2021-02-23  3.5  29.318
3 2021-02-24 -1.1  29.209
4 2021-02-25 -2.7  29.074

CodePudding user response:

Convert columns to lowercase in rename and remove duplicates by Index.duplicated in DataFrame.loc:

df = df.loc[:, ~df.rename(columns=str.lower).columns.duplicated()]
print (df)
         name  cpi    icpi
         freq    M       D
0  2021-02-21 -9.8  31.524
1  2021-02-22 -5.6  30.777
2  2021-02-23  3.5  29.318
3  2021-02-24 -1.1  29.209
4  2021-02-25 -2.7  29.074
  • Related