I try to understand how to create a table of data I have divided into bins using pandas.cut
where the data ranges are in the right order.
Using the following code to generate random ages:
import numpy as np
import pandas as pd
ages = np.random.standard_normal(1000)*20 30
ages[ages<0]=0
ages[ages>120]=120
I bin the data using this line:
ages = pd.Series(ages, dtype=int)
ages_cut = pd.cut(ages,[0,20,40,60,80,100,120])
However, when I use ages_cut.value_counts()
I get a table with the age ranges in a wrong order:
(20, 40] 379
(0, 20] 268
(40, 60] 233
(60, 80] 56
(80, 100] 3
(100, 120] 0
dtype: int64
CodePudding user response:
In addition of the comment of @QuangHoang, you can use value_counts
with a bins
parameter:
bins : int, optional
Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.
>>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
(-0.001, 20.0] 334
(20.0, 40.0] 382
(40.0, 60.0] 224
(60.0, 80.0] 54
(80.0, 100.0] 6
(100.0, 120.0] 0
dtype: int64