Home > Enterprise >  Slice MultiIndex by multiple tuples
Slice MultiIndex by multiple tuples

Time:02-01

I have a DataFrame with multiple index levels. I define some subset by selecting multiple combinations of all levels but the last. Then I want to slice the original DataFrame with that subset, but I cannot find how. Best is to look at a simple example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': ['A', 'B', 'A', 'B'], 'b': ['X', 'X', 'X', 'Y'],
   ...:                    'c': ['S', 'T', 'T', 'T'], 'd': [1, 2, 3, 1]}).set_index(['a', 'b', 'c'])

In [3]: print(df.to_string())
       d
a b c
A X S  1
B X T  2
A X T  3
B Y T  1

In [4]: sel = df.index.droplevel('c')[df.d == 1]  # Some selection on multiple index levels.

In [5]: print(sel)
MultiIndex([('A', 'X'),
            ('B', 'Y')],
           names=['a', 'b'])

Now I would like all rows from df where (a, b) in sel, in this case all but the second row. I tried .loc, .xs and more.

I'm sure I can manipulate the index (drop level c, select, then add level c again), but that feels like a workaround. The same goes for an inner join. I must be overlooking some method...?

CodePudding user response:

One idea is use Index.isin with boolean indexing:

df = df[df.index.droplevel('c').isin(sel)]
print (df)
       d
a b c   
A X S  1
    T  3
B Y T  1
  • Related