Home > database >  Pandas MultiIndex get all of one level 0 columns and one specific column
Pandas MultiIndex get all of one level 0 columns and one specific column

Time:07-20

I have a pandas DataFrame with a MultiIndex on the columns and want to access all of the columns of bar plus one additional column under baz - how can I achieve that?

Input:

import numpy as np
import pandas as pd

np.random.seed(123)
iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]
idx = pd.MultiIndex.from_product(iterables, names=["first", "second"])
df = pd.DataFrame(np.random.randn(4, 8), columns=idx)

first        bar                 baz                 foo                 qux
second       one       two       one       two       one       two       one       two
0      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437 -2.426679 -0.428913
1       1.265936 -0.866740 -0.678886 -0.094709  1.491390 -0.638902 -0.443982 -0.434351
2       2.205930  2.186786  1.004054  0.386186  0.737369  1.490732 -0.935834  1.175829
3      -1.253881 -0.637752  0.907105 -1.428681 -0.140069 -0.861755 -0.255619 -2.798589

Expected output:

first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

I can specify specific columns like df.loc[:, [("bar", "one"), ("bar", "two"), ("baz", "two")]] but I can't seem to specify an arbitrary number for only one top-level, like df.loc[:, ["bar", ("baz", "two")]]

CodePudding user response:

iloc solution

ix = df.columns.get_loc
df.iloc[:, np.r_[ix('bar'), ix(('baz', 'two'))]]

first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

CodePudding user response:

If you don't want to key in everything, you can split it up and then use pd.concat.

bar = df.loc[:, ["bar"]]
baz = df.loc[:, [('baz','two')]]
pd.concat([bar,baz], axis=1)

Out[451]: 
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

CodePudding user response:

An alternative route would be to select all of Baz and bar and subsequently drop the unwanted column :

df.loc(axis = 1)[['bar','baz']].drop(columns=('baz','one'))
Out[24]:
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

You could also preselect the relevant columns with a list comprehension:

columns = [(left, right) 
           for left, right in df.columns 
           if left == 'bar' 
           or (left, right) == ('baz', 'one')]

df.loc[:, columns]

first        bar                 baz
second       one       two       one
0      -1.085631  0.997345  0.282978
1       1.265936 -0.866740 -0.678886
2       2.205930  2.186786  1.004054
3      -1.253881 -0.637752  0.907105

with pyjanitor's select_columns, one can pass a callable, in this case it mimics the list comprehension above:

df.select_columns(lambda df: df.name[0] == 'bar' or df.name == ('baz', 'one'))
Out[31]:
first        bar                 baz
second       one       two       one
0      -1.085631  0.997345  0.282978
1       1.265936 -0.866740 -0.678886
2       2.205930  2.186786  1.004054
3      -1.253881 -0.637752  0.907105
  • Related