Home > Enterprise >  Most efficient way to build ranks within groups based on minimum
Most efficient way to build ranks within groups based on minimum

Time:04-20

I have got a DataFrame in which I'd like to form ranks within groups based on the minimum value of another column. The following syntax already solves the problem. But I ask myself whether there is a more efficient way of doing this? To me this seems to be too circuitous, especially since I need the 'helper' column. The following code shows what shall be done:

df = pd.DataFrame({"group1": [10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20, 20, 20], 
                   "group2": [1,1,1,2,2,2,3,3,3,3,4,4,4], 
                   "length": (2,45,8,4,78,1,1,7,15,6,8,7,5)})

df["helper"] = df.groupby(["group1", 'group2'])["length"].transform("min")
df["goal"] = df.groupby(["group1"])["helper"].rank(method="dense")
df

That's how the dataframe looks like in the end.

    group1  group2  length  helper  goal
0   10      1       2       2       2.0
1   10      1       45      2       2.0
2   10      1       8       2       2.0
3   10      2       4       1       1.0
4   10      2       78      1       1.0
5   10      2       1       1       1.0
6   20      3       1       1       1.0
7   20      3       7       1       1.0
8   20      3       15      1       1.0
9   20      3       6       1       1.0
10  20      4       8       5       2.0
11  20      4       7       5       2.0
12  20      4       5       5       2.0

Of course, the column "goal" is what I want.

CodePudding user response:

If you don't want to make a helper column, you can chain groupby():

df.assign(goal = df.groupby(['group1','group2'])['length'].transform('min')
          .groupby(df['group1']).rank('dense'))

CodePudding user response:

While your method is functional and explicit, an alternative could be to use groupby.apply to use the groups successively:

df['goal'] = (df
 .groupby('group1', group_keys=False)
 .apply(lambda g: g.groupby('group2')['length'].transform('min').rank(method="dense"))
 )

Output:

    group1  group2  length  goal
0       10       1       2   2.0
1       10       1      45   2.0
2       10       1       8   2.0
3       10       2       4   1.0
4       10       2      78   1.0
5       10       2       1   1.0
6       20       3       1   1.0
7       20       3       7   1.0
8       20       3      15   1.0
9       20       3       6   1.0
10      20       4       8   2.0
11      20       4       7   2.0
12      20       4       5   2.0
  • Related