I got a MultiIndex dataframe called df with two indexes (index1, index2). I want to search by row to check if the value in the Column exists in any of the multi-index.
This is what df looks like
Column
index1 index2
a b b
c e
d c
f e e
g e
h f
In terms of the boolean value, this is what I want to generate in order to filter the dataframe
Column
index1 index2
a b True
c False
d False
f e True
g False
h True (f is in the index1)
The final output should be like this:
Column
index1 index2
a b b
f e e
h f
Is there any good practice to handle this?
CodePudding user response:
You can use get_level_values
:
m1 = df.index.get_level_values('index1') == df['Column']
m2 = df.index.get_level_values('index2') == df['Column']
out = df[m1|m2]
print(out)
# Output
Column
index1 index2
a b b
f e e
h f
Generic way
import numpy as np
out = df[np.logical_or.reduce([list(lvl) == df['Column'] for lvl in zip(*df.index)])]
print(out)
# Output
Column
index1 index2
a b b
f e e
h f
CodePudding user response:
Use MultiIndex.to_frame
with DataFrame.eq
for compare all levels and DataFrame.any
for test if at least one level is match:
df1 = df[df.index.to_frame().eq(df['Column'], axis=0).any(axis=1)]
print (df1)
Column
index1 index2
a b b
f e e
h f
Or use list comprehension with in
for test if exist value of column in index:
df1 = df[[v in k for k, v in df['Column'].items()]]
print (df1)
Column
index1 index2
a b b
f e e
h f