Home > Back-end >  How to concat all columns in a multiindex dataframe?
How to concat all columns in a multiindex dataframe?

Time:05-04

I have a multiindex df that I'm trying to concat. The columns are:

a.columns

MultiIndex([(              'Note', '507.3'),
            (              'Note', '507.4'),
            (              'Note', '507.5'),
            (              'Note', '507.6'),
            ('Standard Deviation', '507.3'),
            ('Standard Deviation', '507.4'),
            ('Standard Deviation', '507.5'),
            ('Standard Deviation', '507.6'),
            (             'Value', '507.3'),
            (             'Value', '507.4'),
            (             'Value', '507.5'),
            (             'Value', '507.6')],
           names=[None, 'ESTS id'])

When I do

pd.concat([a['Note']['507.3'],a['Note']['507.4'],a['Note']['507.5']],axis=1)

I get the result I want for those 3 columns.

But I can't figure out how to concat all columns without manually writing them out like that.

I tried

pd.concat([a.columns],axis=1)

TypeError: cannot concatenate object of type '<class 'pandas.core.indexes.multi.MultiIndex'>'; only Series and DataFrame objs are valid

pd.concat(a[a.columns],axis=1)


TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

CodePudding user response:

Firstly, may I suggest that you will be more likely to get an answer that is helpful for you, if you are clearer about what your expected output is.

However, based on your statement that:

pd.concat([a['Note']['507.3'],a['Note']['507.4'],a['Note']['507.5']], axis=1)

achieved what you want for those three columns, I assume that your intention is to drop the first level of your MultiIndex column names, resulting in a DataFrame with the following columns:

Index(['507.3', '507.4', '507.5', '507.6', '507.3', '507.4', '507.5', '507.6','507.3', '507.4', '507.5', '507.6'], dtype='object')

To achieve this, you can use a.droplevel(level=0, axis=1). See the documentation for the droplevel method here.

Brief Explanation

  1. The level=0 tells pandas which level of the MultiIndex you would like to drop. If you not familiar with the concept of levels in a MultiIndex, you may want to familiarise yourself with the concept of MultiIndex's more generally, for example here. You can see the levels of a MultiIndex object using .levels attribute. So in your example, a.columns.levels[0] is a regular Index object containing ['Note', 'Note', .... , 'Value'], and a.columns.levels[1] contains the column names you are trying to keep.
  2. The axis=1 keyword argument tells pandas you are referring the columns. The default behaviour is to operate on the row index, i.e. axis=0.

Other Notes

Notice that you will be left with non-unique column names, which may cause unintended results later if you are unaware of this. If your only goal is to have columns which are not a MultiIndex, but instead an Index type, then it may be safer approach to do something like:

a.columns = a.columns.to_flat_index().

The .to_flat_index() method (from the documentation):

Convert a MultiIndex to an Index of Tuples containing the level values.

So the above snippet replaces the MultiIndex columns with Index type columns, where each value is a tuple. Therefore, to refer to an individual column you would need to use (as an example):

a[('Note', '507.3')]

Alternatively, if you prefer the convenience of column names which are strings, but want to maintain unique columns names, you could do something like:

a.columns = [f"{x}::{y}" for x, y in a.columns]

resulting in a.columns:

Index(['Note::507.3', 'Note::507.4', 'Note::507.5', 'Note::507.6',
       'Standard Deviation::507.3', 'Standard Deviation::507.4',
       'Standard Deviation::507.5', 'Standard Deviation::507.6',
       'Value::507.3', 'Value::507.4', 'Value::507.5', 'Value::507.6'],
      dtype='object')

Note you may replace the :: with any other separator you like.

  • Related