I am thinking a way to order the data frame and create a column to sort the order. For example:
df = pd.DataFrame({'YYYYMM':[202206,202207,202206,202209,202206,202207]})
YYYYMM
0 202206
1 202207
2 202206
3 202209
4 202206
5 202207
Then I tried to order it by using numpy
df['order'] = np.argsort(df['YYYYMM'])
YYYYMM order
0 202206 0
1 202207 2
2 202206 4
3 202209 1
4 202206 5
5 202207 3
However, I want the same value can share the same order like
YYYYMM ORDER
0 202206 0
1 202207 1
2 202206 0
3 202209 2
4 202206 0
5 202207 1
What should I do to achieve it? Thank you.
CodePudding user response:
Use Series.rank
with method='dense'
, convert to integers and subtract 1
:
df['order'] = df['YYYYMM'].rank(method='dense').astype(int).sub(1)
print (df)
YYYYMM order
0 202206 0
1 202207 1
2 202206 0
3 202209 2
4 202206 0
5 202207 1
CodePudding user response:
Use rank
with the method='dense'
parameter and subtract 1 as the first ranks is 1 and convert to integer:
df['order'] = df['YYYYMM'].rank(method='dense').sub(1).astype(int)
output:
YYYYMM order
0 202206 0
1 202207 1
2 202206 0
3 202209 2
4 202206 0
5 202207 1