Home > Blockchain >  How to add colmun with order number based on rules to DataFrame?
How to add colmun with order number based on rules to DataFrame?

Time:06-27

suggested below questions don't sove my problem, because I want to add ordering based on rules. Suggested question don't answer to that. And question is not a duplicate. I have a DataFrame and I need to add a 'new column' with the order number of each value. I was able to do that, but I wonder: 1- is there a more correct/elegant way to do this? Also, is it possible: 2- to give equivalent numbers in the same order? For example in my case second and third rows have the same value, and is it possible to assign 2 for both of them? 3- to set rule for defining order for example, if difference between rows is less than 0,5 then they should be assigned the same row order. If more, then order number should increase. Thank you in advance!

np.random.seed(42)
df2=pd.DataFrame(np.random.randint(1,10, 10), columns=['numbers'])
df2=df2.sort_values('numbers')
df2['ord']=1 np.arange(0, len(df2['numbers']))

enter image description here

CodePudding user response:

If you want to use the same order number to identical "numbers", use groupby.ngroup:

df2['ord'] = df2.groupby('numbers').ngroup().add(1)

Output:

   numbers  ord
5        3    1
1        4    2
9        4    2
3        5    3
8        5    3
0        7    4
4        7    4
6        7    4
2        8    5
7        8    5

grouping with threshold

grouper = df2['numbers'].diff().gt(1).cumsum()
df2['ord_threshold'] = df2.groupby(grouper).ngroup().add(1)

Output:

   numbers  ord  ord_threshold
5        3    1              1
1        4    2              1
9        4    2              1
3        5    3              1
8        5    3              1
0        7    4              2
4        7    4              2
6        7    4              2
2        8    5              2
7        8    5              2

CodePudding user response:

you can do as well by reseting indexes:

np.random.seed(42)
df2=pd.DataFrame(np.random.randint(1,10, 10), columns=['numbers'])
df2=df2.sort_values('numbers').reset_index(drop=True)
#reset indexes
df2.reset_index(inplace=True)
#put value of new indexes ( 1) in ord column
df2['ord']=df2['index'] 1
#clean index column created
df2.drop(columns='index',inplace=True)

print(df2)

Result:

   numbers  ord
0        3    1
1        4    2
2        4    3
3        5    4
4        5    5
5        7    6
6        7    7
7        7    8
8        8    9
9        8   10

CodePudding user response:

Let us try

df2['ord'] = df2['numbers'].factorize()[0]   1 
  • Related