I have a DataFrame with multiple index levels. I define some subset by selecting multiple combinations of all levels but the last. Then I want to slice the original DataFrame with that subset, but I cannot find how. Best is to look at a simple example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': ['A', 'B', 'A', 'B'], 'b': ['X', 'X', 'X', 'Y'],
...: 'c': ['S', 'T', 'T', 'T'], 'd': [1, 2, 3, 1]}).set_index(['a', 'b', 'c'])
In [3]: print(df.to_string())
d
a b c
A X S 1
B X T 2
A X T 3
B Y T 1
In [4]: sel = df.index.droplevel('c')[df.d == 1] # Some selection on multiple index levels.
In [5]: print(sel)
MultiIndex([('A', 'X'),
('B', 'Y')],
names=['a', 'b'])
Now I would like all rows from df
where (a
, b
) in sel
, in this case all but the second row. I tried .loc
, .xs
and more.
I'm sure I can manipulate the index (drop level c
, select, then add level c
again), but that feels like a workaround. The same goes for an inner join. I must be overlooking some method...?
CodePudding user response:
One idea is use Index.isin
with boolean indexing
:
df = df[df.index.droplevel('c').isin(sel)]
print (df)
d
a b c
A X S 1
T 3
B Y T 1