How to automatically change values with new specific values from a dataframe column?-CodePudding

I have a dataframe "df" containing multiple columns, each row is associated to 150 clusters (result of a clustering method). I have extracted from this dataframe random rows which constitute a shorter dataframe "df-new". This new dataframe has 9 clusters repeated over more than 100 rows :

...   cluster 
       0
       0
       4
      95
     ...
     155
      98
      95

Present cluster number, in order, are : 0,4,8,25,26,95,98,144,175

I would like to create a new column "new" which change for every row the cluster number:

initial    new
0          0
4          1
8          2
25         3

How can I iterate this for every row?

CodePudding user response：

You can first get all your selected clusters using :

clusters = df_new["cluster"].unique()

Then, using the argsort function from numpy, you can create a dictionary where the keys will be the cluster number, and the value the rank of this cluster in your sub selection :

mapping = dict(zip(clusters,np.argsort(clusters)))

Now, you can create your new columns with:

df_new["new"]  = df["cluster"].apply(lambda x: mapping[x])

OUTPUT:

 clusters new
0   0   0
1   0   0
2   4   1
3   95  2
4   155 4
5   98  3
6   95  2