I am new to pandas, and I want to figure out how to group values based on sample quantiles.
For example, I have a dataframe with a column a
.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('a'))
Then what I want to do is to divide the values in a
into 10 different group by their deciles, and named the label of their group in a column b
.
Which means in the new column:
a | b | |
---|---|---|
60 | 2 | group 1 |
30 | 3 | group 1 |
94 | 3 | group 1 |
92 | 3 | group 1 |
63 | 3 | group 1 |
... | ... | ... |
47 | 92 | group 10 |
58 | 98 | group 10 |
66 | 99 | group 10 |
73 | 99 | group 10 |
24 | 100 | group 10 |
Is that possible?
Now what I do is get the deciles of a
first as df2
, then merge the df2
to the existing df
. A row is like
a | decile 1 | decile 2 | ... | decile 10 | |
---|---|---|---|---|---|
0 | 1 | 5.5 | 18 | ... | 100 |
And get their group by compare them with deciles by a lot of if else condition. It is ok for like quartiles, but almost impossible for like percentiles.
I hope I explain my questions clearly enough. If there is any misleading expression, please let me know.
Thanks for any help in advance!
CodePudding user response:
First part answer is subtract 1 with integer division by 10
and add 1
for start groups from 1
:
df = pd.DataFrame({'a':range(1,101)})
df['b'] = 'group ' (df.a.sub(1) // 10 1).astype(str)
print(df)
a b
0 1 group 1
1 2 group 1
2 3 group 1
3 4 group 1
4 5 group 1
.. ... ...
95 96 group 10
96 97 group 10
97 98 group 10
98 99 group 10
99 100 group 10
EDIT: For deciles use qcut
:
df['b'] = pd.qcut(df.a, 10, labels=False)