This is the following DataFrame I have
Cluster | CLV Score |
---|---|
2 | 571,038 |
3 | 1,474,358 |
1 | 568,211 |
Since the context of the table was about customer segmentation I wanted to create a new column that contains the name of each cluster based on the clv score another thing to account is that the user can be changed by the user
the output would look like this
Cluster | CLV Score | Cluster Name |
---|---|---|
2 | 1,474,358 | Gold Customer |
1 | 571,038 | Silver Customer |
3 | 568,211 | Dormant Customer |
Any help or explanation are very appreciate Thank you!
CodePudding user response:
You could use numpy (nested) to check if the scores are between certain values and set that to a column.
df["Cluster Name"] = np.where(
df["CLV Score"] < 570000, "Dormant Customer",
np.where(df["CLV Score"] < 1400000, "Silver Customer", "Gold Customer")
)
CodePudding user response:
That's a job for pd.cut()
:
>>> df.assign(cluster_name=pd.cut(df['CLV Score'], bins=3,
... labels=['Dormant', 'Silver', 'Gold']))
CLV Score cluster_name
Cluster
2 571038 Dormant
3 1474358 Gold
1 568211 Dormant
You can of course customize the bin edges:
>>> df.assign(cluster_name=pd.cut(
... df['CLV Score'], bins=[0, 0.57e6, 1e6, float('inf')],
... labels=['Dormant', 'Silver', 'Gold']))
CLV Score cluster_name
Cluster
2 571038 Silver
3 1474358 Gold
1 568211 Dormant