I have a dataframe in which one of the columns is of type interval[float64]
and I wish to filter that dataframe to retreive on of the bins:
distance_bin object:s in bin percentage Date
0 (-0.001, 1.0] 1092 46.00 2021-06-20
1 (1.0, 2.0] 533 22.45 2021-06-20
2 (2.0, 3.0] 257 10.83 2021-06-20
3 (3.0, 4.0] 46 1.94 2021-06-20
4 (4.0, 5.0] 144 6.07 2021-06-20
5 (5.0, 6.0] 117 4.93 2021-06-20
6 (6.0, 7.0] 58 2.44 2021-06-20
7 (7.0, 8.0] 46 1.94 2021-06-20
8 (8.0, 9.0] 48 2.02 2021-06-20
9 (9.0, 10.0] 16 0.67 2021-06-20
10 (10.0, 11.0] 6 0.25 2021-06-20
11 (11.0, 12.0] 8 0.34 2021-06-20
12 (12.0, 13.0] 1 0.04 2021-06-20
13 (15.0, 16.0] 1 0.04 2021-06-20
14 (21.0, 22.0] 1 0.04 2021-06-20
15 (-0.001, 1.0] 1080 45.55 2021-06-21
16 (1.0, 2.0] 546 23.03 2021-06-21
17 (2.0, 3.0] 245 10.33 2021-06-21
18 (3.0, 4.0] 54 2.28 2021-06-21
19 (4.0, 5.0] 150 6.33 2021-06-21
20 (5.0, 6.0] 104 4.39 2021-06-21
21 (6.0, 7.0] 44 1.86 2021-06-21
22 (7.0, 8.0] 51 2.15 2021-06-21
23 (8.0, 9.0] 38 1.60 2021-06-21
24 (9.0, 10.0] 36 1.52 2021-06-21
25 (10.0, 11.0] 7 0.30 2021-06-21
26 (11.0, 12.0] 15 0.63 2021-06-21
27 (12.0, 13.0] 1 0.04 2021-06-21
28 (-0.001, 1.0] 1094 46.24 2021-06-22
29 (1.0, 2.0] 517 21.85 2021-06-22
30 (2.0, 3.0] 289 12.21 2021-06-22
31 (3.0, 4.0] 42 1.78 2021-06-22
32 (4.0, 5.0] 139 5.87 2021-06-22
33 (5.0, 6.0] 98 4.14 2021-06-22
34 (6.0, 7.0] 43 1.82 2021-06-22
35 (7.0, 8.0] 47 1.99 2021-06-22
36 (8.0, 9.0] 46 1.94 2021-06-22
37 (9.0, 10.0] 30 1.27 2021-06-22
38 (10.0, 11.0] 6 0.25 2021-06-22
39 (11.0, 12.0] 15 0.63 2021-06-22
To give you a way to transform the interval in the right datatype, I provide you with this code:
def interval_type(s):
table = str.maketrans({'[': '(', ']': ')'})
left_closed = s.startswith('[')
right_closed = s.endswith(']')
left, right = ast.literal_eval(s.translate(table))
t = 'neither'
if left_closed and right_closed:
t = 'both'
elif left_closed:
t = 'left'
elif right_closed:
t = 'right'
return pd.Interval(left, right, closed=t)
df['distance_bin'] = df['distance_bin'].apply(interval_type)
Now, I thought that to filter df to only keep the interval (5.0, 6.0]
it would be enough to do the following thing:
df_A = df_smh[df['distance_bin']=='(5.0, 6.0]']
But it returns an empty dataframe. I tried a multitude of other things, like:
df_A = df_smh[df.distance_bin.left>5 & df.distance_bin.right<=6]
but this returns:
AttributeError: 'Series' object has no attribute 'left'
Any guidance here would be greatly appreciated.
CodePudding user response:
For me working comparing by Interval
like:
df = pd.DataFrame({'distance_bin':[pd.Interval(-0.001, 1.0, closed='right'),
pd.Interval(1.0, 2.0, closed='right'),
pd.Interval(2.0, 3.0, closed='right'),
pd.Interval(3.0, 4.0, closed='right'),
pd.Interval(4.0, 5.0, closed='right')]})
df = df[df['distance_bin']==pd.Interval(1.0, 2.0, closed='right')]
print (df)
distance_bin
1 (1.0, 2.0]