Creating clusters based on a plot-CodePudding

I have a dataset like this:

Region	Year	Month	rate	residuals
1	2010	1	0.5	0.5
2	2010	1	4.0	0.5

This dataset continues it has 15'000 observations.

I created a scatter plot :

plot(df$full.residuals, df$rate, main="Scatterplot", 
   xlab="rate", ylab="Residuals")

Now I can't do it further to create cluster in the plot? Does anyone know how to create clusters in the plot?

CodePudding user response：

First of all I created some more random datapoints, because with 2 points it will be hard to create clusters. You could use kmeans as an algorithm to create clusters. In this case I decide to create 2 clusters which you can change if you want. With the factoextra package you can create some nice visualizations like this:

library(factoextra)

set.seed(123)
df <- data.frame(rate = runif(20, 0, 1),
                 full.residuals = runif(20, 0, 1))

kmeans_cluster <- kmeans(scale(df), 2, nstart = 5)

kmeans_cluster$cluster
#>  [1] 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 2

fviz_cluster(kmeans_cluster, data = df,
             palette = c("#2E9FDF", "#00AFBB"), 
             geom = "point",
             ellipse.type = "convex", 
             ggtheme = theme_bw())

^{Created on 2022-08-18 with reprex v2.0.2}

I would suggest to have a look at this link for some extra information about using this package.