Home > Back-end >  Filtering dataframe on a column of type interval[float64]
Filtering dataframe on a column of type interval[float64]

Time:10-27

I have a dataframe in which one of the columns is of type interval[float64] and I wish to filter that dataframe to retreive on of the bins:

distance_bin  object:s in bin  percentage        Date
0   (-0.001, 1.0]          1092       46.00  2021-06-20
1      (1.0, 2.0]           533       22.45  2021-06-20
2      (2.0, 3.0]           257       10.83  2021-06-20
3      (3.0, 4.0]            46        1.94  2021-06-20
4      (4.0, 5.0]           144        6.07  2021-06-20
5      (5.0, 6.0]           117        4.93  2021-06-20
6      (6.0, 7.0]            58        2.44  2021-06-20
7      (7.0, 8.0]            46        1.94  2021-06-20
8      (8.0, 9.0]            48        2.02  2021-06-20
9     (9.0, 10.0]            16        0.67  2021-06-20
10   (10.0, 11.0]             6        0.25  2021-06-20
11   (11.0, 12.0]             8        0.34  2021-06-20
12   (12.0, 13.0]             1        0.04  2021-06-20
13   (15.0, 16.0]             1        0.04  2021-06-20
14   (21.0, 22.0]             1        0.04  2021-06-20
15  (-0.001, 1.0]          1080       45.55  2021-06-21
16     (1.0, 2.0]           546       23.03  2021-06-21
17     (2.0, 3.0]           245       10.33  2021-06-21
18     (3.0, 4.0]            54        2.28  2021-06-21
19     (4.0, 5.0]           150        6.33  2021-06-21
20     (5.0, 6.0]           104        4.39  2021-06-21
21     (6.0, 7.0]            44        1.86  2021-06-21
22     (7.0, 8.0]            51        2.15  2021-06-21
23     (8.0, 9.0]            38        1.60  2021-06-21
24    (9.0, 10.0]            36        1.52  2021-06-21
25   (10.0, 11.0]             7        0.30  2021-06-21
26   (11.0, 12.0]            15        0.63  2021-06-21
27   (12.0, 13.0]             1        0.04  2021-06-21
28  (-0.001, 1.0]          1094       46.24  2021-06-22
29     (1.0, 2.0]           517       21.85  2021-06-22
30     (2.0, 3.0]           289       12.21  2021-06-22
31     (3.0, 4.0]            42        1.78  2021-06-22
32     (4.0, 5.0]           139        5.87  2021-06-22
33     (5.0, 6.0]            98        4.14  2021-06-22
34     (6.0, 7.0]            43        1.82  2021-06-22
35     (7.0, 8.0]            47        1.99  2021-06-22
36     (8.0, 9.0]            46        1.94  2021-06-22
37    (9.0, 10.0]            30        1.27  2021-06-22
38   (10.0, 11.0]             6        0.25  2021-06-22
39   (11.0, 12.0]            15        0.63  2021-06-22

To give you a way to transform the interval in the right datatype, I provide you with this code:

def interval_type(s):
    table = str.maketrans({'[': '(', ']': ')'})
    left_closed = s.startswith('[')
    right_closed = s.endswith(']')

    left, right = ast.literal_eval(s.translate(table))

    t = 'neither'
    if left_closed and right_closed:
        t = 'both'
    elif left_closed:
        t = 'left'
    elif right_closed:
        t = 'right'

    return pd.Interval(left, right, closed=t)

df['distance_bin'] = df['distance_bin'].apply(interval_type)

Now, I thought that to filter df to only keep the interval (5.0, 6.0] it would be enough to do the following thing:

df_A = df_smh[df['distance_bin']=='(5.0, 6.0]']

But it returns an empty dataframe. I tried a multitude of other things, like:

df_A = df_smh[df.distance_bin.left>5 & df.distance_bin.right<=6]

but this returns:

AttributeError: 'Series' object has no attribute 'left'

Any guidance here would be greatly appreciated.

CodePudding user response:

For me working comparing by Interval like:

df = pd.DataFrame({'distance_bin':[pd.Interval(-0.001, 1.0, closed='right'), 
                         pd.Interval(1.0, 2.0, closed='right'), 
                         pd.Interval(2.0, 3.0, closed='right'), 
                         pd.Interval(3.0, 4.0, closed='right'), 
                         pd.Interval(4.0, 5.0, closed='right')]})

df = df[df['distance_bin']==pd.Interval(1.0, 2.0, closed='right')]
print (df)
  distance_bin
1   (1.0, 2.0]
  • Related