Group by a category-CodePudding

I have done KMeans clusters and now I need to analyse each individual cluster. For example look at cluster 1 and see what clients are on it and make conclusions.

dfRFM['idcluster'] = num_cluster
dfRFM.head()

    idcliente   Recencia    Frecuencia  Monetario   idcluster
1    3            251            44      -90.11          0
2    8           1011            44      87786.44        2
6    88           537            36       8589.57        0
7    98           505             2       -179.00        0
9    156          11             15       35259.50       0

How do I group so I only see results from lets say idcluster 0 and sort by lets say "Monetario". Thanks!

CodePudding user response：

To filter a dataframe, the most common way is to use df[df[colname] == val] Then you can use df.sort_values()

In your case, that would look like this:

dfRFM_id0 = dfRFM[dfRFM['idcluster']==0].sort_values('Monetario')

The way this filtering works is that dfRFM['idcluster']==0 returns a series of True/False based on if it is, well, true or false. So then we have a sort of dfRFM[(True,False,True,True...)], and so the dataframe returns only the rows where we have a True. That is, filtering/selecting the data where the condition is true.

edit: add 'the way this works...'

CodePudding user response：

I think you actually just need to filter your DF!

df_new = dfRFM[dfRFM.idcluster == 0]

and then sort by Montario

df_new = df_new.sort_values(by = 'Monetario')

Group by is really best for when you're wanting to look at the cluster as a whole - for example, if you wanted to see the average values for Recencia, Frecuencia, and Monetario for all of Group 0.