# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
names=['subject', 'type'])
# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data = 37
# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data
How can I get all rows that are using pd.query
:
- Bob HR > 35
- any subjects HR > 35
If this is not possible with pd.query
at the moment, this would be the accepted answer.
CodePudding user response:
You can't use MultiIndexes with query
.
(Well, you can to some limit use health_data.query('@health_data.Bob.HR > 35')
, but this has many flaws, and you won't be able to do this with a sublevel only).
You can use xs
instead to select your levels:
# is Bod/HR > 35?
m1 = health_data[('Bob', 'HR')].gt(35)
# is any HR > 35?
m2 = health_data.xs('HR', level='type', axis=1).gt(35).any(axis=1)
# keep rows with both conditions
out = health_data.loc[m1&m2]
output:
subject Bob Guido Sue
type HR Temp HR Temp HR Temp
year visit
2013 1 42.0 38.0 45.0 37.7 33.0 36.9
2 48.0 38.2 42.0 36.2 21.0 35.9
2014 2 50.0 39.2 44.0 36.0 36.0 35.8