Suppose we'd like to filter a dataframe with conditions that refer to a slice of a multi-index column, here is a toy example:
import pandas as pd
import numpy as np
ix = pd.IndexSlice
df = pd.DataFrame(data=np.array([[1, 2, 3, None], [None, 2, 3, 4]]).T,
columns=pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y')]))
# this gives
a b
x y
0 1 None
1 2 2
2 3 3
3 None 4
I would like to filter this dataframe with a condition that resembles my real use-case:
df[df.isnull().any(axis=1) | df[ix[:, 'x']].isin([2])]
=> TypeError: unhashable type: 'slice'
or even:
df[df.isnull().any(axis=1) | df[df.columns[df.columns.droplevel(0).isin(['x'])]].isin([2])]
=> ValueError: cannot join with no overlapping index names
I would like to apply such a filtering criteria but paying attention to the fact that the columns level 0 should be ignored when looking up the column name.
I have also tried:
df[df.isnull().any(axis=1) | df[(slice(None), 'x')].isin([2])]
=> TypeError: unhashable type: 'slice'
CodePudding user response:
Second condition return one or multiple columns DataFrame, so is possible add any
for Series mask:
df1 = df[df.isnull().any(axis=1) | df.xs('x', axis=1, level=1).isin([2]).any(axis=1)]
Or:
df1 = df[df.isnull().any(axis=1) | df.loc[:, ix[:, 'x']].isin([2]).any(axis=1)]