Testing functionality of Slice(None) in Pandas MultiIndex-CodePudding

Trying to understand the use cases for Slice(None). The code for building the sample dataframe is located here: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html, but I'm repeating it for convenience:

    def mklbl(prefix, n):
    return ["%s%s" % (prefix, i) for i in range(n)]


miindex = pd.MultiIndex.from_product(
    [mklbl("A", 4), mklbl("B", 2), mklbl("C", 4), mklbl("D", 2)]
)


micolumns = pd.MultiIndex.from_tuples(
    [("a", "foo"), ("a", "bar"), ("b", "foo"), ("b", "bah")], names=["lvl0", "lvl1"]
)


dfmi = (
    pd.DataFrame(
        np.arange(len(miindex) * len(micolumns)).reshape(
            (len(miindex), len(micolumns))
        ),
        index=miindex,
        columns=micolumns,
    )
    .sort_index()
    .sort_index(axis=1)
)

I'm trying to test my understanding as follows:

dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')),:]
dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')), ['a']]
dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')), slice('a')]

The first line lets me extract A0, Bo and C1, C2 from the index. The second line lets me extract all elements of the col index, and the last line is a just a repeat of the second using the slice notation for column indices.

My question is - can I use slice to also extract just one specific column (say 'bar')? The documentation doesn't quite "complete" the example (it could have suggested how to extract a specific column under a column index) using slice.

I tried:

dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')), slice('a', 'bar')]

but this gives me both a and b (both indices of the columns). Same effect with the code below.

dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')), slice('bar')]

and finally,

# dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')), ['bar']]

gives the error - 'bar' is not in index (as expected).

Am I correct in stating that slice cannot be used to extract 'a' and 'bar'; I'll need to switch to IndexSlice, or xs? Appreciate the help!

CodePudding user response：

This get you what you want?

dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2'))]['a','bar']
A0  B0  C1  D0     9
            D1    13
        C2  D0    17
            D1    21
Name: (a, bar), dtype: int32

Or this? use parentheses around the column slices similar to index

dfmi.loc[ (slice('A0'), slice('B0'),slice('C1', 'C2')),(slice('a'), slice('bar'))]

lvl0          a
lvl1        bar
A0 B0 C1 D0   9
         D1  13
      C2 D0  17
         D1  21

CodePudding user response：

We can use IndexSlice for advance indexing which is a more preferred way:

dfmi.loc[pd.IndexSlice['A0', 'B0', 'C1':'C2'], ('a', 'bar')]

Alternatively with slice you can do:

dfmi.loc[('A0', 'B0', slice('C1', 'C2')), ('a', 'bar')]

A0  B0  C1  D0     9
            D1    13
        C2  D0    17
            D1    21
Name: (a, bar), dtype: int32