Home > Blockchain >  How to handle 'interval' type values returned by pd.cut directly?
How to handle 'interval' type values returned by pd.cut directly?

Time:08-13

I am using 'pd.cut' to separate the array elements into different bins and use 'value_counts' to count the frequency of each bin. My code and the result I get are like this.

s = pd.Series([5,9,2,4,5,6,7,9,5,3,8,7,4,6,8])
pd.cut(s,5).value_counts()
>>> pd.cut(s,5).value_counts()
(4.8, 6.2]      5
(7.6, 9.0]      4
(1.993, 3.4]    2
(3.4, 4.8]      2
(6.2, 7.6]      2

I want to get the values of the first three lines of the index part of the result, that is:

[4.8, 6.2]
[7.6, 9.0]  
[1.993, 3.4]

or is better:

[4.8, 6.2, 7.6, 9.0, 1.993, 3.4] 

but I searched for some information and found that pandas does not seem to have a method to directly handle this interval data, so I had to use the following stupid method, then combine them into list or array:

v1 = pd.cut(s,5).value_counts().index[0].left
v2 = pd.cut(s,5).value_counts().index[0].right
v3 = pd.cut(s,5).value_counts().index[1].left
...
v6 = pd.cut(s,5).value_counts().index[2].right

So is there an easier way to achieve what I need?

CodePudding user response:

Convert CategoricalIndex to IntervalIndex, so possible use IntervalIndex.left, IntervalIndex.right:

s = pd.cut(s,5).value_counts()
i = pd.IntervalIndex(s.index)

L1 = list(zip(i.left, i.right))[:3]
print (L1)
[(4.8, 6.2), (7.6, 9.0), (1.993, 3.4)]

L2 = [y for x in L1 for y in x]
print (L2)
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]
  • Related