I am using 'pd.cut' to separate the array elements into different bins and use 'value_counts' to count the frequency of each bin. My code and the result I get are like this.
s = pd.Series([5,9,2,4,5,6,7,9,5,3,8,7,4,6,8])
pd.cut(s,5).value_counts()
>>> pd.cut(s,5).value_counts()
(4.8, 6.2] 5
(7.6, 9.0] 4
(1.993, 3.4] 2
(3.4, 4.8] 2
(6.2, 7.6] 2
I want to get the values of the first three lines of the index part of the result, that is:
[4.8, 6.2]
[7.6, 9.0]
[1.993, 3.4]
or is better:
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]
but I searched for some information and found that pandas does not seem to have a method to directly handle this interval data, so I had to use the following stupid method, then combine them into list or array:
v1 = pd.cut(s,5).value_counts().index[0].left
v2 = pd.cut(s,5).value_counts().index[0].right
v3 = pd.cut(s,5).value_counts().index[1].left
...
v6 = pd.cut(s,5).value_counts().index[2].right
So is there an easier way to achieve what I need?
CodePudding user response:
Convert CategoricalIndex
to IntervalIndex
, so possible use IntervalIndex.left
,
IntervalIndex.right
:
s = pd.cut(s,5).value_counts()
i = pd.IntervalIndex(s.index)
L1 = list(zip(i.left, i.right))[:3]
print (L1)
[(4.8, 6.2), (7.6, 9.0), (1.993, 3.4)]
L2 = [y for x in L1 for y in x]
print (L2)
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]