Home > Back-end >  A more convenient way, than 'query', to select MultiIndexed rows, designating partial labe
A more convenient way, than 'query', to select MultiIndexed rows, designating partial labe

Time:10-12

Let's say you have a lot of named index levels -- I'll show 4 here but use your imagination:

midx = pd.MultiIndex.from_product([['A0','A1'], ['B0','B1'],['C0','C1'],['D0','D1']],names=['quack','woof','honk','snarf'])
dfmi = pd.DataFrame(np.arange(32).reshape((len(midx), len(columns))),index=midx, columns=columns)
dfmi
                       foo  bar
quack woof honk snarf          
A0    B0   C0   D0       0    1
                D1       2    3
           C1   D0       4    5
                D1       6    7
      B1   C0   D0       8    9
                D1      10   11
           C1   D0      12   13
                D1      14   15
A1    B0   C0   D0      16   17
                D1      18   19
           C1   D0      20   21
                D1      22   23
      B1   C0   D0      24   25
                D1      26   27
           C1   D0      28   29
                D1      30   31

Then, somewhere along the line, you forgot which level number the name 'snarf' was associated with, and even how many levels there were, and you'd like to do something like:

dfmi[dfmi.snarf=='D1']

except the DataFrame is big so keeping another, reset_indexed copy around would take too much space and it would be slow anyway, and, being lazy, you don't want to go look it up, and you don't want to dive into yet another syntax with query:

dfmi.query('snarf'=='D1')

oops!

dfmi.query("'snarf'=='D1'")

oops AGAIN!

dfmi.query("snarf=='D1'")
                      foo  bar
quack woof honk snarf          
A0    B0   C0   D1       2    3
           C1   D1       6    7
      B1   C0   D1      10   11
           C1   D1      14   15
A1    B0   C0   D1      18   19
           C1   D1      22   23
      B1   C0   D1      26   27
           C1   D1      30   31

Finally!

CodePudding user response:

Not sure it's more convenient, but one alternative to string based query is to use index.get_level_values:

dfmi[dfmi.index.get_level_values('snarf') == 'D1']

                       foo  bar
quack woof honk snarf          
A0    B0   C0   D1       2    3
           C1   D1       6    7
      B1   C0   D1      10   11
           C1   D1      14   15
A1    B0   C0   D1      18   19
           C1   D1      22   23
      B1   C0   D1      26   27
           C1   D1      30   31

CodePudding user response:

If we're looking for corresponding values (equality comparison) within a given level by name, then xs could work:

dfmi.xs('D1', level='snarf', drop_level=False)

                       foo  bar
quack woof honk snarf          
A0    B0   C0   D1       2    3
           C1   D1       6    7
      B1   C0   D1      10   11
           C1   D1      14   15
A1    B0   C0   D1      18   19
           C1   D1      22   23
      B1   C0   D1      26   27
           C1   D1      30   31

CodePudding user response:

You can also use:

df = df[df.index.isin(['D1'], level='snarf')]

Output:

                       foo  bar
quack woof honk snarf          
A0    B0   C0   D1       2    3
           C1   D1       6    7
      B1   C0   D1      10   11
           C1   D1      14   15
A1    B0   C0   D1      18   19
           C1   D1      22   23
      B1   C0   D1      26   27
           C1   D1      30   31

CodePudding user response:

You could just keep a copy of the index

inds = dfmi.index.to_frame()
dfmi[inds.snarf == "D1"]
 
#                        foo  bar
# quack woof honk snarf          
# A0    B0   C0   D1       2    3
#            C1   D1       6    7
#       B1   C0   D1      10   11
#            C1   D1      14   15
# A1    B0   C0   D1      18   19
#            C1   D1      22   23
#       B1   C0   D1      26   27
#            C1   D1      30   31
  • Related