I have done KMeans clusters and now I need to analyse each individual cluster. For example look at cluster 1 and see what clients are on it and make conclusions.
dfRFM['idcluster'] = num_cluster
dfRFM.head()
idcliente Recencia Frecuencia Monetario idcluster
1 3 251 44 -90.11 0
2 8 1011 44 87786.44 2
6 88 537 36 8589.57 0
7 98 505 2 -179.00 0
9 156 11 15 35259.50 0
How do I group so I only see results from lets say idcluster 0 and sort by lets say "Monetario". Thanks!
CodePudding user response:
To filter a dataframe, the most common way is to use df[df[colname] == val]
Then you can use df.sort_values()
In your case, that would look like this:
dfRFM_id0 = dfRFM[dfRFM['idcluster']==0].sort_values('Monetario')
The way this filtering works is that dfRFM['idcluster']==0
returns a series of True/False
based on if it is, well, true or false. So then we have a sort of dfRFM[(True,False,True,True...)]
, and so the dataframe returns only the rows where we have a True. That is, filtering/selecting the data where the condition is true.
edit: add 'the way this works...'
CodePudding user response:
I think you actually just need to filter your DF!
df_new = dfRFM[dfRFM.idcluster == 0]
and then sort by Montario
df_new = df_new.sort_values(by = 'Monetario')
Group by is really best for when you're wanting to look at the cluster as a whole - for example, if you wanted to see the average values for Recencia, Frecuencia, and Monetario for all of Group 0.