Home > Enterprise >  Pandas MultiIndex select rows where value is between two levels
Pandas MultiIndex select rows where value is between two levels

Time:12-25

I have a DataFrame with a MultiIndex with 2 levels A and B. I want to select a row by querying for a value that is between level A and B. There is only one row that matches each value. Selecting a single row works fine using slicing.

import pandas as pd

df = pd.DataFrame({
    'A': [0, 3, 6, 9],
    'B': [2, 5, 8, 11],
    'C': [0, 1, 2, 3],
})
df = df.set_index(['A', 'B'], drop=True)

print(df)
#       C
# A B    
# 0 2   0
# 3 5   1
# 6 8   2
# 9 11  3

# Get row where 1 in between A and B
print(df.loc[(slice(None, 1), slice(1, None)), :])
#      C
# A B   
# 0 2  0

However, I need to select many rows. Doing this one by one is slow. Is there any way to select multiple rows?

Note: I don't have to use a DataFrame. If there is some data structure that is better suited for this job, I would be happy to use that.

CodePudding user response:

You can use .between and pass columns 'A' and 'B' without having to make them levels until later. In the below example, there are two rows where 'C' is between 'A' and 'B':

df = pd.DataFrame({
    'A': [0, 3, 6, 9],
    'B': [2, 5, 8, 11],
    'C': [0, 1, 2, 10],
})

df_selected = df[df['C'].between(df['A'],df['B'])].set_index(['A', 'B'], drop=True)

Result:

       C
A B     
0 2    0
9 11  10

CodePudding user response:

You can try this:

df2 = df.reset_index()
df2.query("A <= C <= B").set_index(['A', 'B'], drop=True)

CodePudding user response:

To get a more instructive example, let's expand your source DataFrame by one more row with the value of C between A and B:

        C
A  B     
0  2    0
3  5    1
6  8    2
9  11   3
11 13  12

To get rows where C is between A and B you can:

  1. Start from saving the index of df in a separate variable:

    ind = df.index
    
  2. Then, to get what you want, run:

    result = df[np.logical_and(ind.get_level_values(0) <= df.C,
        ind.get_level_values(1) >= df.C)]
    

The result is:

        C
A  B     
0  2    0
11 13  12

So, as you can see, there is no need to reset the index.

Or without np.logical_and:

result = df[(ind.get_level_values(0) <= df.C) & (ind.get_level_values(1) >= df.C)]
  • Related