I am using the following code to generate the rank column,
df["rank"] = df.groupby(['group1','userId'])[['rank_level1','rank_level2']].rank(method='first', ascending=True).astype(int)
but as you can see in the following example data it is generating the wrong order of ranking considering rank_level2 column
expected_Rank is the ranking order I am expecting
group1 | userId | geoId | rank_level1 | rank_level2 | rank | expected_Rank |
---|---|---|---|---|---|---|
a | 1 | q | 3 | 3.506102795 | 1 | 8 |
a | 1 | w | 3 | -9.359613563 | 2 | 2 |
a | 1 | e | 3 | -2.368458072 | 3 | 3 |
a | 1 | r | 3 | 13.75731938 | 4 | 9 |
a | 1 | t | 3 | 0.229777761 | 5 | 5 |
a | 1 | y | 3 | -10.25124866 | 6 | 1 |
a | 1 | u | 3 | 2.82822285 | 7 | 7 |
a | 1 | i | 3 | 0 | 8 | 4 |
a | 1 | o | 3 | 1.120593402 | 9 | 6 |
a | 1 | p | 4 | 1.98 | 10 | 10 |
a | 1 | z | 4 | 5.110299374 | 11 | 11 |
b | 1 | p | 2 | -9.552317622 | 1 | 1 |
b | 1 | r | 3 | 1.175083121 | 2 | 6 |
b | 1 | t | 3 | 0 | 3 | 5 |
b | 1 | o | 3 | 9.383253146 | 4 | 8 |
b | 1 | w | 3 | 5.782528196 | 5 | 7 |
b | 1 | i | 3 | -0.680999413 | 6 | 4 |
b | 1 | y | 3 | -0.990387248 | 7 | 3 |
b | 1 | e | 3 | -11.18793533 | 8 | 2 |
b | 1 | z | 3 | 12.33791512 | 9 | 9 |
b | 1 | u | 4 | -4.799979138 | 10 | 11 |
b | 1 | q | 4 | -25.92 | 11 | 10 |
CodePudding user response:
Create tuples by both columns and then use GroupBy.transform
with Series.rank
and method='dense'
:
df["rank"] = (df.assign(new=df[['rank_level1','rank_level2']].agg(tuple, 1))
.groupby(['group1','userId'])['new']
.transform(lambda x: x.rank(method='dense', ascending=True))
.astype(int))
print (df)
group1 userId geoId rank_level1 rank_level2 rank expected_Rank
0 a 1 q 3 3.506103 8 8
1 a 1 w 3 -9.359614 2 2
2 a 1 e 3 -2.368458 3 3
3 a 1 r 3 13.757319 9 9
4 a 1 t 3 0.229778 5 5
5 a 1 y 3 -10.251249 1 1
6 a 1 u 3 2.828223 7 7
7 a 1 i 3 0.000000 4 4
8 a 1 o 3 1.120593 6 6
9 a 1 p 4 1.980000 10 10
10 a 1 z 4 5.110299 11 11
11 b 1 p 2 -9.552318 1 1
12 b 1 r 3 1.175083 6 6
13 b 1 t 3 0.000000 5 5
14 b 1 o 3 9.383253 8 8
15 b 1 w 3 5.782528 7 7
16 b 1 i 3 -0.680999 4 4
17 b 1 y 3 -0.990387 3 3
18 b 1 e 3 -11.187935 2 2
19 b 1 z 3 12.337915 9 9
20 b 1 u 4 -4.799979 11 11
21 b 1 q 4 -25.920000 10 10
because:
df["rank"] = df.assign(new=df[['rank_level1','rank_level2']].agg(tuple, 1)).groupby(['group1','userId'])['new'].rank(method='first', ascending=True).astype(int)
DataError: No numeric types to aggregate