Home > database >  categorize data into N categories where each category has the same number of data but different inte
categorize data into N categories where each category has the same number of data but different inte

Time:07-30

I have a series of stock returns, could be approximate 5000 data. I want to categorize them into 5 categories. Each categories should have almost the same number of data.

for example, categorize following data into 3 categories:

test = pd.DataFrame({'Returns': [0.003,0.005,0.02,0.01,0.1,0.9,-0.2,-0.13,-0.14,-0.03,0,0.001]})

it will have result when using:

test.value_counts()


Category:   number of data
0                   3
1                   3
2                   3

the intervals of data could be different.

CodePudding user response:

Try with qcut

test['cate'] = pd.qcut(test.Returns,3).cat.codes
test['cate'].value_counts()
Out[577]: 
0    4
1    4
2    4
Name: cate, dtype: int64
  • Related