I have a series of stock returns, could be approximate 5000 data. I want to categorize them into 5 categories. Each categories should have almost the same number of data.
for example, categorize following data into 3 categories:
test = pd.DataFrame({'Returns': [0.003,0.005,0.02,0.01,0.1,0.9,-0.2,-0.13,-0.14,-0.03,0,0.001]})
it will have result when using:
test.value_counts()
Category: number of data
0 3
1 3
2 3
the intervals of data could be different.
CodePudding user response:
Try with qcut
test['cate'] = pd.qcut(test.Returns,3).cat.codes
test['cate'].value_counts()
Out[577]:
0 4
1 4
2 4
Name: cate, dtype: int64