how to select rows in pandas dataframe based on between from another column-CodePudding

I have a dataframe organised like so:

    x   y   e
A 0 0.0 1.0 0.01
  1 0.1 0.9 0.03
  2 0.2 1.3 0.02
...
B 0 0.0 0.5 0.02
  1 0.1 0.6 0.02
  2 0.2 0.9 0.04
...

etc.

I would like to select rows of a A/B/etc. that fall between certain values in x.

This, for example, works:

p,q=0,1
indices=df.loc[("A"),"x"].between(p,q)
df.loc[("A"),"y"][indices]

Out:
[1.0,0.9]

However, this takes two lines of code, and uses chain indexing. However, what is to me the obvious way of one-lining this doesn't work:

p,q=0,1
df.loc[("A",df[("A"),"x"].between(p,q)),"y"]

Out:
[1.0,0.9]

How can I avoid chain indexing here?

(Also, if anyone wants to explain how to make the "x" column into the indices and thereby avoid the '0,1,2' indices, feel free!)

[Edited to clarify desired output]

CodePudding user response：

You can merge your 2 lines of code by using a lambda function.

>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']

1    0.9
2    1.3
Name: y, dtype: float64

The output of your code:

indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)

# Output
1    0.9
2    1.3
Name: y, dtype: float64

CodePudding user response：

You can do

cond = df['x'].between(0.05,0.15) & (df.index.get_level_values(level=0)=='A')
df[cond]
Out[284]: 
       x    y     e
A B                
A 1  0.1  0.9  0.03

CodePudding user response：

The trick here is to combine everything into a single boolean indexer. So to convert .loc['A', …] you can use df.index.get_level_values(0) == 'A' and then combine with your other conditional via &

import numpy as np
import pandas as pd
df = pd.DataFrame(
    data=np.linspace([0]*3, [1]*3 , 10, axis=0),
    index=pd.MultiIndex.from_product([['A', 'B'], range(5)]),
    columns=[*'xye'], 
)

out = df.loc[(df.index.get_level_values(0) == 'A') & (df['x'].between(.1, .4))]
print(out)
            x         y         e
A 1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333

This is what my input data looked like:

print(df)
            x         y         e
A 0  0.000000  0.000000  0.000000
  1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333
  4  0.444444  0.444444  0.444444
B 0  0.555556  0.555556  0.555556
  1  0.666667  0.666667  0.666667
  2  0.777778  0.777778  0.777778
  3  0.888889  0.888889  0.888889
  4  1.000000  1.000000  1.000000