Home > Software engineering >  How to create a new column with a custom cluster name according to the value of clv score, K-Means C
How to create a new column with a custom cluster name according to the value of clv score, K-Means C

Time:12-22

This is the following DataFrame I have

Cluster CLV Score
2 571,038
3 1,474,358
1 568,211

Since the context of the table was about customer segmentation I wanted to create a new column that contains the name of each cluster based on the clv score another thing to account is that the user can be changed by the user

the output would look like this

Cluster CLV Score Cluster Name
2 1,474,358 Gold Customer
1 571,038 Silver Customer
3 568,211 Dormant Customer

Any help or explanation are very appreciate Thank you!

CodePudding user response:

You could use numpy (nested) to check if the scores are between certain values and set that to a column.

df["Cluster Name"] = np.where(
  df["CLV Score"] < 570000, "Dormant Customer",
    np.where(df["CLV Score"] < 1400000, "Silver Customer", "Gold Customer")
)

CodePudding user response:

That's a job for pd.cut():

>>> df.assign(cluster_name=pd.cut(df['CLV Score'], bins=3,
...                               labels=['Dormant', 'Silver', 'Gold']))
         CLV Score cluster_name
Cluster                        
2           571038      Dormant
3          1474358         Gold
1           568211      Dormant

You can of course customize the bin edges:

>>> df.assign(cluster_name=pd.cut(
...     df['CLV Score'], bins=[0, 0.57e6, 1e6, float('inf')],
...     labels=['Dormant', 'Silver', 'Gold']))
         CLV Score cluster_name
Cluster                        
2           571038       Silver
3          1474358         Gold
1           568211      Dormant
  • Related