Home > Blockchain >  How to order and index the column in a dataframe?
How to order and index the column in a dataframe?

Time:10-21

I am thinking a way to order the data frame and create a column to sort the order. For example:

df = pd.DataFrame({'YYYYMM':[202206,202207,202206,202209,202206,202207]})
   YYYYMM
0  202206
1  202207
2  202206
3  202209
4  202206
5  202207

Then I tried to order it by using numpy

df['order'] = np.argsort(df['YYYYMM'])
   YYYYMM  order
0  202206      0
1  202207      2
2  202206      4
3  202209      1
4  202206      5
5  202207      3

However, I want the same value can share the same order like

   YYYYMM ORDER
0  202206 0
1  202207 1
2  202206 0
3  202209 2
4  202206 0
5  202207 1

What should I do to achieve it? Thank you.

CodePudding user response:

Use Series.rank with method='dense', convert to integers and subtract 1:

df['order'] = df['YYYYMM'].rank(method='dense').astype(int).sub(1)
print (df)
   YYYYMM  order
0  202206      0
1  202207      1
2  202206      0
3  202209      2
4  202206      0
5  202207      1

CodePudding user response:

Use rank with the method='dense' parameter and subtract 1 as the first ranks is 1 and convert to integer:

df['order'] = df['YYYYMM'].rank(method='dense').sub(1).astype(int)

output:

   YYYYMM  order
0  202206      0
1  202207      1
2  202206      0
3  202209      2
4  202206      0
5  202207      1
  • Related