In R
I have some data:
library(tidyverse)
d = data.frame(
"A" = c(0.1, 0.76, 0.54, 0.79, 0.28),
"B" = c(3, 2, 4, 3, 4),
"C" = c(2, 2, 7, 3, 4),
"D" = c(4, 3, 6, 12, 7)
)
Where I can apply across each column a computation to compute the ntiles
:
d %>%
mutate(across(.cols = everything(),
.fns = ~ ntile(., 5),
.names = "{.col}_ntiles")
)
Which gives me the following output - Desired output:
A B C D A_ntiles B_ntiles C_ntiles D_ntiles
1 0.10 3 2 4 1 2 1 2
2 0.76 2 2 3 4 1 2 1
3 0.54 4 7 6 3 4 5 3
4 0.79 3 3 12 5 3 3 5
5 0.28 4 4 7 2 5 4 4
I am trying to do this same task in Python.
I managed to get as far as the following:
import pandas as pd
d = pd.DataFrame({"A":[0.1, 0.76, 0.54, 0.79, 0.28],
"B":[3, 2, 4, 3, 4],
"C":[2, 2, 7, 3, 4],
"D":[4, 3, 6, 12, 7]})
d.quantile([0.2, 0.4, 0.6, 0.8], axis = 0)
Which gives me:
A B C D
0.2 0.244 2.8 2.0 3.8
0.4 0.436 3.0 2.6 5.2
0.6 0.628 3.4 3.4 6.4
0.8 0.766 4.0 4.6 8.0
However, I would like to obtain which rank category each ntile
belongs to - just as in the R
example using A_ntiles
, B_ntiles
... D_ntiles
.
CodePudding user response:
You can check with rank
out = d.join(d.rank(method='first').add_suffix('_ntile').astype(int))
out
Out[92]:
A B C D A_ntile B_ntile C_ntile D_ntile
0 0.10 3 2 4 1 2 1 2
1 0.76 2 2 3 4 1 2 1
2 0.54 4 7 6 3 4 5 3
3 0.79 3 3 12 5 3 3 5
4 0.28 4 4 7 2 5 4 4
CodePudding user response:
Update
rank_quantile = lambda x: pd.qcut(x, q=5, labels=False, duplicates='drop')
out = pd.concat([d, d.apply(rank_quantile).add(1).add_suffix('_nqtiles')], axis=1)
print(out)
# Output
A B C D A_nqtiles B_nqtiles C_nqtiles D_nqtiles
0 0.10 3 2 4 1 2 1 2
1 0.76 2 2 3 4 1 1 1
2 0.54 4 7 6 3 4 4 3
3 0.79 3 3 12 5 2 2 5
4 0.28 4 4 7 2 4 3 4
Old answer
Use rank
with method='first'
then concatenate your dataframe:
out = pd.concat([d, d.rank(method='first').astype(int).add_suffix('_ntile')], axis=1)
print(out)
# Output
A B C D A_ntile B_ntile C_ntile D_ntile
0 0.10 3 2 4 1 2 1 2
1 0.76 2 2 3 4 1 2 1
2 0.54 4 7 6 3 4 5 3
3 0.79 3 3 12 5 3 3 5
4 0.28 4 4 7 2 5 4 4