Home > database >  Boolean filtering of different lengths with Multi Indexes
Boolean filtering of different lengths with Multi Indexes

Time:01-20

I am trying to filter my multi-indexed data df based on the column value employ_start_date and index employ_class .

data = pd.DataFrame({'agcy_nbr': {0: 166, 1: 435, 2: 129, 3: 129, 4: 129},
 'employ_class': {0: 'Corr', 1: 'LE', 2: 'Corr', 3: 'Corr', 4: 'Corr'},
 'employ_start_date': {0: 1999, 1: 2001, 2: 2002, 3: 204, 4: 1995},
 'person_nbr': {0: 156723, 1: 154633, 2: 109360, 3: 139837, 4: 153842}})

df = pd.pivot_table(data, index = ['agcy_nbr', 'employ_start_date'], columns = ['employ_class'], values = ['person_nbr'], aggfunc = 'count')

which can separetely be obtained through the following:

df.iloc[:,df.columns.get_level_values('employ_class') == 'LE']
df[df.index.get_level_values('employ_start_date') > 2000]

I would like to filter df based on the two conditions, to something that would look similar to this:

df.iloc[:,(df.index.get_level_values('employ_start_date') > 2000) & (df.columns.get_level_values('employ_class') == 'LE')]

but I get ValueError: operands could not be broadcast together with shapes(5,) (2,). I tried reshaping, flattening and raveling both the index and columns but without success.

CodePudding user response:

Specify the respective (df.loc[<for rows labels>:<for columns labels>) rows/columns multiindices conditions in access/indexing call (pandas.DataFrame.loc):

df.loc[df.index.get_level_values('employ_start_date') > 2000, df.columns.get_level_values('employ_class') == 'LE']

                           person_nbr
employ_class                       LE
agcy_nbr employ_start_date           
129      2002                     NaN
435      2001                     1.0
  • Related