Home > database >  How to filter a dataframe using a condition that requires referencing a slice of a multi-index colum
How to filter a dataframe using a condition that requires referencing a slice of a multi-index colum

Time:04-05

Suppose we'd like to filter a dataframe with conditions that refer to a slice of a multi-index column, here is a toy example:

import pandas as pd
import numpy as np
ix = pd.IndexSlice

df = pd.DataFrame(data=np.array([[1, 2, 3, None], [None, 2, 3, 4]]).T,
                  columns=pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y')]))
# this gives
      a     b
      x     y
0     1  None
1     2     2
2     3     3
3  None     4

I would like to filter this dataframe with a condition that resembles my real use-case:

df[df.isnull().any(axis=1) | df[ix[:, 'x']].isin([2])]
=> TypeError: unhashable type: 'slice'

or even:

df[df.isnull().any(axis=1) | df[df.columns[df.columns.droplevel(0).isin(['x'])]].isin([2])]
=> ValueError: cannot join with no overlapping index names

I would like to apply such a filtering criteria but paying attention to the fact that the columns level 0 should be ignored when looking up the column name.

I have also tried:

df[df.isnull().any(axis=1) | df[(slice(None), 'x')].isin([2])]
=> TypeError: unhashable type: 'slice'

CodePudding user response:

Second condition return one or multiple columns DataFrame, so is possible add any for Series mask:

df1 = df[df.isnull().any(axis=1) | df.xs('x', axis=1, level=1).isin([2]).any(axis=1)]

Or:

df1 = df[df.isnull().any(axis=1) | df.loc[:, ix[:, 'x']].isin([2]).any(axis=1)]
  • Related