For example, I have a column [1,2,3,4,5,6,7,8,9,10]
, and the quantile is [0, 0.33, 0.66, 1]
(length is not fixed), the df should be groupped into 3 groups (group name is not cared)
Is using for-loop the only way?
CodePudding user response:
You can use a combination of Series.quantile
, pd.cut
, and groupby
to do what you're looking for.
In [1]: import pandas as pd, numpy as np
In [2]: s = pd.Series([1,2,3,4,5,6,7,8,9,10])
Use quantile to find the cut points:
In [3]: qs = s.quantile([0, 0.33, 0.66, 1])
Now you can use cut
to assign each element to a bin, using the quantiles as your bin edges:
In [8]: pd.cut(s, bins=qs, include_lowest=True)
Out[8]:
0 (0.999, 3.97]
1 (0.999, 3.97]
2 (0.999, 3.97]
3 (3.97, 6.94]
4 (3.97, 6.94]
5 (3.97, 6.94]
6 (6.94, 10.0]
7 (6.94, 10.0]
8 (6.94, 10.0]
9 (6.94, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(0.999, 3.97] < (3.97, 6.94] < (6.94, 10.0]]
You can use the results of cut directly in a groupby
operation, e.g. groupby.mean
:
In [9]: s.groupby(pd.cut(s, bins=qs, include_lowest=True)).mean()
Out[9]:
(0.999, 3.97] 2.0
(3.97, 6.94] 5.0
(6.94, 10.0] 8.5
dtype: float64