How to group by quantile (not only calculate quantile)-CodePudding

For example, I have a column [1,2,3,4,5,6,7,8,9,10], and the quantile is [0, 0.33, 0.66, 1] (length is not fixed), the df should be groupped into 3 groups (group name is not cared)

Is using for-loop the only way?

CodePudding user response：

You can use a combination of Series.quantile, pd.cut, and groupby to do what you're looking for.

In [1]: import pandas as pd, numpy as np

In [2]: s = pd.Series([1,2,3,4,5,6,7,8,9,10])

Use quantile to find the cut points:

In [3]: qs = s.quantile([0, 0.33, 0.66, 1])

Now you can use cut to assign each element to a bin, using the quantiles as your bin edges:

In [8]: pd.cut(s, bins=qs, include_lowest=True)
Out[8]:
0    (0.999, 3.97]
1    (0.999, 3.97]
2    (0.999, 3.97]
3     (3.97, 6.94]
4     (3.97, 6.94]
5     (3.97, 6.94]
6     (6.94, 10.0]
7     (6.94, 10.0]
8     (6.94, 10.0]
9     (6.94, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(0.999, 3.97] < (3.97, 6.94] < (6.94, 10.0]]

You can use the results of cut directly in a groupby operation, e.g. groupby.mean:

In [9]: s.groupby(pd.cut(s, bins=qs, include_lowest=True)).mean()
Out[9]:
(0.999, 3.97]    2.0
(3.97, 6.94]     5.0
(6.94, 10.0]     8.5
dtype: float64