I am trying to filter my multi-indexed data df
based on the column value employ_start_date
and index employ_class
.
data = pd.DataFrame({'agcy_nbr': {0: 166, 1: 435, 2: 129, 3: 129, 4: 129},
'employ_class': {0: 'Corr', 1: 'LE', 2: 'Corr', 3: 'Corr', 4: 'Corr'},
'employ_start_date': {0: 1999, 1: 2001, 2: 2002, 3: 204, 4: 1995},
'person_nbr': {0: 156723, 1: 154633, 2: 109360, 3: 139837, 4: 153842}})
df = pd.pivot_table(data, index = ['agcy_nbr', 'employ_start_date'], columns = ['employ_class'], values = ['person_nbr'], aggfunc = 'count')
which can separetely be obtained through the following:
df.iloc[:,df.columns.get_level_values('employ_class') == 'LE']
df[df.index.get_level_values('employ_start_date') > 2000]
I would like to filter df
based on the two conditions, to something that would look similar to this:
df.iloc[:,(df.index.get_level_values('employ_start_date') > 2000) & (df.columns.get_level_values('employ_class') == 'LE')]
but I get ValueError: operands could not be broadcast together with shapes(5,) (2,)
. I tried reshaping, flattening and raveling both the index and columns but without success.
CodePudding user response:
Specify the respective (df.loc[<for rows labels>:<for columns labels>
) rows/columns multiindices conditions in access/indexing call (pandas.DataFrame.loc
):
df.loc[df.index.get_level_values('employ_start_date') > 2000, df.columns.get_level_values('employ_class') == 'LE']
person_nbr
employ_class LE
agcy_nbr employ_start_date
129 2002 NaN
435 2001 1.0