I have a dataframe with a numeric column and I would link to calculate the percentile of the values in each row for that column considering only previous rows of the column. Here is an example:
-------
| col_1 |
-------
| 5 |
-------
| 4 |
-------
| 10 |
-------
| 1 |
-------
| 7 |
-------
I would like to obtain a dataframe like this:
------- ------------
| col_1 | percentile |
------- ------------
| 5 | 100 |
------- ------------
| 4 | 50 |
------- ------------
| 10 | 100 |
------- ------------
| 1 | 25 |
------- ------------
| 7 | 80 |
------- ------------
How can I calculate it?
CodePudding user response:
Try as follows.
- Use
df.expanding
withmin_periods=1
to allow expanding window calculations. - For each window, we apply
Expanding.rank
withpct=True
(and we multiply by100
).
We can assign the result directly to the new column percentile
:
import pandas as pd
data = {'col_1':[5,4,10,1,7]}
df = pd.DataFrame(data)
df['percentile'] = df['col_1'].expanding(min_periods=1).rank(pct=True).mul(100)
print(df)
col_1 percentile
0 5 100.0
1 4 50.0
2 10 100.0
3 1 25.0
4 7 80.0
Update: Expanding.rank
was added to pandas
in version 1.4.0
. For earlier versions, you could for instance try:
temp = df['col_1'].expanding(min_periods=1).agg(['rank','count'])
df['percentile'] = (temp['rank']/temp['count']).mul(100)
print(df)
col_1 percentile
0 5 100.0
1 4 50.0
2 10 100.0
3 1 25.0
4 7 80.0
Or, as a one-liner:
df['percentile'] = df['col_1'].expanding(min_periods=1)\
.apply(lambda x: (x.rank()/x.count()).to_numpy()[-1]*100)