I have a dataframe like this:
DURATION CLUSTER COEFF
3 0 0.34
3 1 -0.005
3 2 1
3 3 0.33
4 0 -0.02
4 1 -0.28
4 2 0.22
4 3 0.48
5 0 0.65
5 1 -0.26
5 2 0.1
5 3 0.15
I want to create a RESULT categorical column according to the "COEFF" coefficients for each "DURATION". The one with the greatest "COEFF" value will be "First" and so on.
Desired output like this:
DURATION CLUSTER COEFF RESULT
3 0 0.34 Second
3 1 -0.005 Fourth
3 2 1 First
3 3 0.33 Third
4 0 -0.02 Third
4 1 -0.28 Fourth
4 2 0.22 Second
4 3 0.48 First
5 0 0.65 First
5 1 -0.26 Fourth
5 2 0.1 Third
5 3 0.15 Second
Could you please help me about this?
CodePudding user response:
Use groupby.rank
and map
:
labels = ['First', 'Second', 'Third', 'Fourth', 'Fifth']
df['RESULT'] = (df.groupby('DURATION')['COEFF']
.rank('dense', ascending=False).sub(1)
.map(dict(enumerate(labels)))
)
Output:
DURATION CLUSTER COEFF RESULT
0 3 0 0.340 Second
1 3 1 -0.005 Fourth
2 3 2 1.000 First
3 3 3 0.330 Third
4 4 0 -0.020 Third
5 4 1 -0.280 Fourth
6 4 2 0.220 Second
7 4 3 0.480 First
8 5 0 0.650 First
9 5 1 -0.260 Fourth
10 5 2 0.100 Third
11 5 3 0.150 Second
CodePudding user response:
To go a bit further based on https://stackoverflow.com/a/74547858/7237062 excellent answer (I would not have found this myself that fast), I suggest using this Ordinal numbers replacement to completly automate the process.
import pandas as pd
# see answer https://stackoverflow.com/a/20007730/7237062, others exist
# code golfed version of an "ordinal" function (int -> ordinal string in english)
ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(n//10!=1)*(n<4)*n::4])
# copy pasta of OP input data
df = pd.read_clipboard() # let pandas read the clipboard
df["RESULT"] = (df.groupby('DURATION')['COEFF']
.rank('dense', ascending=False)
.sub(1) # mozway's answer so far !
.astype(int)
1 # 1 so ordinals start at 1 (instead of 0)
).apply(ordinal)
results:
DURATION CLUSTER COEFF RESULT
0 3 0 0.340 2nd
1 3 1 -0.005 4th
2 3 2 1.000 1st
3 3 3 0.330 3rd
4 4 0 -0.020 3rd
5 4 1 -0.280 4th
6 4 2 0.220 2nd
7 4 3 0.480 1st
8 5 0 0.650 1st
9 5 1 -0.260 4th
10 5 2 0.100 3rd
11 5 3 0.150 2nd