Accessing multiple columns in 3d pandas dataframe [duplicate]-CodePudding

After constructing a 3-d pandas dataframe I have difficulty accessing only specific columns.

The scenario is as follows:

head = ["h1", "h2"]
cols = ["col_1", "col_2", "col_3"]
heads = len(cols) * [head[0]]   len(cols) * [head[1]]  # -> ['h1','h1','h1','h2','h2','h2']
no_of_rows = 4

A = np.array(heads)
B = np.array(cols * len(head))  # -> ['col_1','col_2','col_3','col_1','col_2','col_3']
C = np.array([np.zeros(no_of_rows)] * len(head) * len(cols))  # -> shape=(6, 4)

df = pd.DataFrame(data=C.T, 
                  columns=pd.MultiIndex.from_tuples(zip(A,B)))

yielding

    h1                      h2
    col_1   col_2   col_3   col_1   col_2   col_3
0   0.0     0.0     0.0     0.0     0.0     0.0
1   0.0     0.0     0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0     0.0     0.0

Now I would like to get e.g. all col_1, meaning col_1 of h1 and col_1 of h2. The output should look like this

    h1      h2
    col_1   col_1   
0   0.0     0.0     
1   0.0     0.0     
2   0.0     0.0     
3   0.0     0.0

Any suggestions how I could access those two columns?

CodePudding user response：

You can use loc with get_level_values(1), as your columns col1, col2, col3 are in the first level of your index:

>>> df.loc[:,df.columns.get_level_values(1).isin(['col_1'])]

    h1    h2
  col_1 col_1
0   0.0   0.0
1   0.0   0.0
2   0.0   0.0
3   0.0   0.0

If you want to grab all columns under h1, you can set get_level_values(0), and grab h1:

>>> df.loc[:,df.columns.get_level_values(0).isin(['h1'])]

    h1            
  col_1 col_2 col_3
0   0.0   0.0   0.0
1   0.0   0.0   0.0
2   0.0   0.0   0.0
3   0.0   0.0   0.0

CodePudding user response：

You can use df.loc with slice(None), as follows:

df.loc[:, (slice(None), 'col_1')]

or use pd.IndexSlice, as follows:

idx = pd.IndexSlice
df.loc[:, idx[:, 'col_1']]

or simply:

df.loc[:, pd.IndexSlice[:, 'col_1']]

(Defining extra variable idx for pd.IndexSlice is useful as a shorthand if you are going to use pd.IndexSlice multiple times. )

Result:

     h1    h2
  col_1 col_1
0   0.0   0.0
1   0.0   0.0
2   0.0   0.0
3   0.0   0.0

You can also do it with .xs() as follows:

df.xs('col_1', level=1, axis=1)

Result:

    h1   h2
0  0.0  0.0
1  0.0  0.0
2  0.0  0.0
3  0.0  0.0

Slightly different output without the repeating col_1 column labels shown.

The first 2 ways support selecting multiple columns too, e.g. ['col_1', 'col_3']:

df.loc[:, (slice(None), ['col_1', 'col_3'])]

and also:

df.loc[:, pd.IndexSlice[:, ['col_1', 'col_3']]]

Result:

     h1          h2      
  col_1 col_3 col_1 col_3
0   0.0   0.0   0.0   0.0
1   0.0   0.0   0.0   0.0
2   0.0   0.0   0.0   0.0
3   0.0   0.0   0.0   0.0