I have this code:
y=pd.DataFrame({'num':[10,12,13,11,14]})
out = (y.join(y['num'].quantile([0.25,0.5,0.75,1])
.set_axis([f'{i}Q' for i in range(1,5)], axis=0)
.to_frame().T
.pipe(lambda x: x.loc[x.index.repeat(len(y))])
.reset_index(drop=True))
.assign(Rank=y['num'].rank(method='first'))
)
The code is working as it is but is not returning What I want. I was trying to rank num
considering only it's row so
10 is rank 1 because 10 <= 1Q value
12 is rank 2 **(not 3)** because 2Q <= 12 < 3Q value
13 is rank 3 **(not 4)** because 3Q <= 13 < 4Q value
11 is rank 1 **(not 2)** because 1Q <= 11 < 2Q value
14 is rank 4 **(not 5)** because 14>= Q4
I tried to change this line:
.assign(Rank=y['num'].rank(method='first'))
to:
.assign(Rank=y['num'].rank(axis=1,method='first'))
But it didn't work.
What am i missing here?
CodePudding user response:
Building on what you already have here:
y = y.join(y['num'].quantile([0.25,0.5,0.75,1])
.set_axis([f'{i}Q' for i in range(1,5)], axis=0)
.to_frame().T
.pipe(lambda x: x.loc[x.index.repeat(len(y))])
.reset_index(drop=True))
we could add the Rank
column as follows. The idea is to compare the num
column with the quantile columns and get the first column name where the quantile value is greater than a num
value. As it happens each quantile column already has rank numbers on it, so we use those to assign values:
y['Rank'] = (y.drop(columns='num').ge(y['num'], axis=0)
.pipe(lambda x: x*x.columns).replace('', pd.NA)
.bfill(axis=1)['1Q'].str[0].astype(int))
Output:
num 1Q 2Q 3Q 4Q Rank
0 10 11.0 12.0 13.0 14.0 1
1 12 11.0 12.0 13.0 14.0 2
2 13 11.0 12.0 13.0 14.0 3
3 11 11.0 12.0 13.0 14.0 1
4 14 11.0 12.0 13.0 14.0 4