I have a DataFrame with a MultiIndex with 2 levels A
and B
. I want to select a row by querying for a value that is between level A
and B
. There is only one row that matches each value.
Selecting a single row works fine using slicing.
import pandas as pd
df = pd.DataFrame({
'A': [0, 3, 6, 9],
'B': [2, 5, 8, 11],
'C': [0, 1, 2, 3],
})
df = df.set_index(['A', 'B'], drop=True)
print(df)
# C
# A B
# 0 2 0
# 3 5 1
# 6 8 2
# 9 11 3
# Get row where 1 in between A and B
print(df.loc[(slice(None, 1), slice(1, None)), :])
# C
# A B
# 0 2 0
However, I need to select many rows. Doing this one by one is slow. Is there any way to select multiple rows?
Note: I don't have to use a DataFrame. If there is some data structure that is better suited for this job, I would be happy to use that.
CodePudding user response:
You can use .between
and pass columns 'A'
and 'B'
without having to make them levels until later. In the below example, there are two rows where 'C'
is between 'A' and 'B':
df = pd.DataFrame({
'A': [0, 3, 6, 9],
'B': [2, 5, 8, 11],
'C': [0, 1, 2, 10],
})
df_selected = df[df['C'].between(df['A'],df['B'])].set_index(['A', 'B'], drop=True)
Result:
C
A B
0 2 0
9 11 10
CodePudding user response:
You can try this:
df2 = df.reset_index()
df2.query("A <= C <= B").set_index(['A', 'B'], drop=True)
CodePudding user response:
To get a more instructive example, let's expand your source DataFrame by one more row with the value of C between A and B:
C
A B
0 2 0
3 5 1
6 8 2
9 11 3
11 13 12
To get rows where C is between A and B you can:
Start from saving the index of df in a separate variable:
ind = df.index
Then, to get what you want, run:
result = df[np.logical_and(ind.get_level_values(0) <= df.C, ind.get_level_values(1) >= df.C)]
The result is:
C
A B
0 2 0
11 13 12
So, as you can see, there is no need to reset the index.
Or without np.logical_and:
result = df[(ind.get_level_values(0) <= df.C) & (ind.get_level_values(1) >= df.C)]