Home > Enterprise >  pandas dataframe groupby rank generates unexpected order of ranking
pandas dataframe groupby rank generates unexpected order of ranking

Time:06-10

I am using the following code to generate the rank column,

df["rank"] = df.groupby(['group1','userId'])[['rank_level1','rank_level2']].rank(method='first', ascending=True).astype(int)

but as you can see in the following example data it is generating the wrong order of ranking considering rank_level2 column

expected_Rank is the ranking order I am expecting

group1 userId geoId rank_level1 rank_level2 rank expected_Rank
a 1 q 3 3.506102795 1 8
a 1 w 3 -9.359613563 2 2
a 1 e 3 -2.368458072 3 3
a 1 r 3 13.75731938 4 9
a 1 t 3 0.229777761 5 5
a 1 y 3 -10.25124866 6 1
a 1 u 3 2.82822285 7 7
a 1 i 3 0 8 4
a 1 o 3 1.120593402 9 6
a 1 p 4 1.98 10 10
a 1 z 4 5.110299374 11 11
b 1 p 2 -9.552317622 1 1
b 1 r 3 1.175083121 2 6
b 1 t 3 0 3 5
b 1 o 3 9.383253146 4 8
b 1 w 3 5.782528196 5 7
b 1 i 3 -0.680999413 6 4
b 1 y 3 -0.990387248 7 3
b 1 e 3 -11.18793533 8 2
b 1 z 3 12.33791512 9 9
b 1 u 4 -4.799979138 10 11
b 1 q 4 -25.92 11 10

CodePudding user response:

Create tuples by both columns and then use GroupBy.transform with Series.rank and method='dense':

df["rank"] = (df.assign(new=df[['rank_level1','rank_level2']].agg(tuple, 1))
                .groupby(['group1','userId'])['new']
                .transform(lambda x: x.rank(method='dense', ascending=True))
                .astype(int))

print (df)
   group1  userId geoId  rank_level1  rank_level2  rank  expected_Rank
0       a       1     q            3     3.506103     8              8
1       a       1     w            3    -9.359614     2              2
2       a       1     e            3    -2.368458     3              3
3       a       1     r            3    13.757319     9              9
4       a       1     t            3     0.229778     5              5
5       a       1     y            3   -10.251249     1              1
6       a       1     u            3     2.828223     7              7
7       a       1     i            3     0.000000     4              4
8       a       1     o            3     1.120593     6              6
9       a       1     p            4     1.980000    10             10
10      a       1     z            4     5.110299    11             11
11      b       1     p            2    -9.552318     1              1
12      b       1     r            3     1.175083     6              6
13      b       1     t            3     0.000000     5              5
14      b       1     o            3     9.383253     8              8
15      b       1     w            3     5.782528     7              7
16      b       1     i            3    -0.680999     4              4
17      b       1     y            3    -0.990387     3              3
18      b       1     e            3   -11.187935     2              2
19      b       1     z            3    12.337915     9              9
20      b       1     u            4    -4.799979    11             11
21      b       1     q            4   -25.920000    10             10

because:

df["rank"] = df.assign(new=df[['rank_level1','rank_level2']].agg(tuple, 1)).groupby(['group1','userId'])['new'].rank(method='first', ascending=True).astype(int)

DataError: No numeric types to aggregate

  • Related