I have got a DataFrame in which I'd like to form ranks within groups based on the minimum value of another column. The following syntax already solves the problem. But I ask myself whether there is a more efficient way of doing this? To me this seems to be too circuitous, especially since I need the 'helper' column. The following code shows what shall be done:
df = pd.DataFrame({"group1": [10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20, 20, 20],
"group2": [1,1,1,2,2,2,3,3,3,3,4,4,4],
"length": (2,45,8,4,78,1,1,7,15,6,8,7,5)})
df["helper"] = df.groupby(["group1", 'group2'])["length"].transform("min")
df["goal"] = df.groupby(["group1"])["helper"].rank(method="dense")
df
That's how the dataframe looks like in the end.
group1 group2 length helper goal
0 10 1 2 2 2.0
1 10 1 45 2 2.0
2 10 1 8 2 2.0
3 10 2 4 1 1.0
4 10 2 78 1 1.0
5 10 2 1 1 1.0
6 20 3 1 1 1.0
7 20 3 7 1 1.0
8 20 3 15 1 1.0
9 20 3 6 1 1.0
10 20 4 8 5 2.0
11 20 4 7 5 2.0
12 20 4 5 5 2.0
Of course, the column "goal" is what I want.
CodePudding user response:
If you don't want to make a helper column, you can chain groupby()
:
df.assign(goal = df.groupby(['group1','group2'])['length'].transform('min')
.groupby(df['group1']).rank('dense'))
CodePudding user response:
While your method is functional and explicit, an alternative could be to use groupby.apply
to use the groups successively:
df['goal'] = (df
.groupby('group1', group_keys=False)
.apply(lambda g: g.groupby('group2')['length'].transform('min').rank(method="dense"))
)
Output:
group1 group2 length goal
0 10 1 2 2.0
1 10 1 45 2.0
2 10 1 8 2.0
3 10 2 4 1.0
4 10 2 78 1.0
5 10 2 1 1.0
6 20 3 1 1.0
7 20 3 7 1.0
8 20 3 15 1.0
9 20 3 6 1.0
10 20 4 8 2.0
11 20 4 7 2.0
12 20 4 5 2.0