Home > Software engineering >  Pandas - Check if value from a column exists in any index of a MultiIndex dataframe
Pandas - Check if value from a column exists in any index of a MultiIndex dataframe

Time:07-22

I got a MultiIndex dataframe called df with two indexes (index1, index2). I want to search by row to check if the value in the Column exists in any of the multi-index.

This is what df looks like

                Column
index1 index2
   a     b         b
         c         e
         d         c
   f     e         e
         g         e
         h         f

In terms of the boolean value, this is what I want to generate in order to filter the dataframe

                Column
index1 index2
   a     b       True
         c       False
         d       False
   f     e       True
         g       False
         h       True (f is in the index1)

The final output should be like this:

                Column
index1 index2
   a     b         b
   f     e         e
         h         f

Is there any good practice to handle this?

CodePudding user response:

You can use get_level_values:

m1 = df.index.get_level_values('index1') == df['Column']
m2 = df.index.get_level_values('index2') == df['Column']
out = df[m1|m2]
print(out)

# Output
              Column
index1 index2       
a      b           b
f      e           e
       h           f

Generic way

import numpy as np

out = df[np.logical_or.reduce([list(lvl) == df['Column'] for lvl in zip(*df.index)])]
print(out)

# Output
              Column
index1 index2       
a      b           b
f      e           e
       h           f

CodePudding user response:

Use MultiIndex.to_frame with DataFrame.eq for compare all levels and DataFrame.any for test if at least one level is match:

df1 = df[df.index.to_frame().eq(df['Column'], axis=0).any(axis=1)]
print (df1)
              Column
index1 index2       
a      b           b
f      e           e
       h           f

Or use list comprehension with in for test if exist value of column in index:

df1 = df[[v in k for k, v in df['Column'].items()]]
print (df1)
              Column
index1 index2       
a      b           b
f      e           e
       h           f
  • Related