Home > Back-end >  How to refer to hierachical column in a pandas query?
How to refer to hierachical column in a pandas query?

Time:10-25

# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                   names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                     names=['subject', 'type'])

# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data  = 37

# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data

How can I get all rows that are using pd.query:

  1. Bob HR > 35
  2. any subjects HR > 35

If this is not possible with pd.query at the moment, this would be the accepted answer.

CodePudding user response:

You can't use MultiIndexes with query.

(Well, you can to some limit use health_data.query('@health_data.Bob.HR > 35'), but this has many flaws, and you won't be able to do this with a sublevel only).

You can use xs instead to select your levels:

# is Bod/HR > 35?
m1 = health_data[('Bob', 'HR')].gt(35)
# is any HR > 35?
m2 = health_data.xs('HR', level='type', axis=1).gt(35).any(axis=1)

# keep rows with both conditions
out = health_data.loc[m1&m2]

output:

subject      Bob       Guido         Sue      
type          HR  Temp    HR  Temp    HR  Temp
year visit                                    
2013 1      42.0  38.0  45.0  37.7  33.0  36.9
     2      48.0  38.2  42.0  36.2  21.0  35.9
2014 2      50.0  39.2  44.0  36.0  36.0  35.8
  • Related