Home > Enterprise >  Find number of datapoints in each range
Find number of datapoints in each range

Time:12-01

I have a data frame that looks like this

data = [['A', 0.20], ['B',0.25], ['C',0.11], ['D',0.30], ['E',0.29]]
df = pd.DataFrame(data, columns=['col1', 'col2'])

Col1 is a primary key (each row has a unique value)

The max of col2 is 1 and the min is 0. I want to find the number of datapoint in ranges 0-.30 (both 0 and 0.30 are included), 0-.29, 0-.28, and so on till 0-.01. I can use pd.cut, but the lower limit is not fixed. My lower limit is always 0. Can someone help?

CodePudding user response:

One option using numpy broadcasting:

step =  0.01
up = np.arange(0, 0.3 step, step)

out = pd.Series((df['col2'].to_numpy()[:,None] <= up).sum(axis=0), index=up)

Output:

0.00    0
0.01    0
0.02    0
0.03    0
0.04    0
0.05    0
0.06    0
0.07    0
0.08    0
0.09    0
0.10    0
0.11    1
0.12    1
0.13    1
0.14    1
0.15    1
0.16    1
0.17    1
0.18    1
0.19    1
0.20    2
0.21    2
0.22    2
0.23    2
0.24    2
0.25    3
0.26    3
0.27    3
0.28    3
0.29    4
0.30    5
dtype: int64

With pandas.cut and cumsum:

step =  0.01

up = np.arange(0, 0.3 step, step)
(pd.cut(df['col2'], up, labels=up[1:].round(2))
   .value_counts(sort=False).cumsum()
)

Output:

0.01    0
0.02    0
0.03    0
0.04    0
0.05    0
0.06    0
0.07    0
0.08    0
0.09    0
0.1     0
0.11    1
0.12    1
0.13    1
0.14    1
0.15    1
0.16    1
0.17    1
0.18    1
0.19    1
0.2     2
0.21    2
0.22    2
0.23    2
0.24    2
0.25    3
0.26    3
0.27    3
0.28    3
0.29    4
0.3     5
Name: col2, dtype: int64
  • Related