Home > database >  Creating clusters based on a plot
Creating clusters based on a plot

Time:08-19

I have a dataset like this:

Region Year Month rate residuals
1 2010 1 0.5 0.5
2 2010 1 4.0 0.5

This dataset continues it has 15'000 observations.

I created a scatter plot :

plot(df$full.residuals, df$rate, main="Scatterplot", 
   xlab="rate", ylab="Residuals")

Now I can't do it further to create cluster in the plot? Does anyone know how to create clusters in the plot?

CodePudding user response:

First of all I created some more random datapoints, because with 2 points it will be hard to create clusters. You could use kmeans as an algorithm to create clusters. In this case I decide to create 2 clusters which you can change if you want. With the factoextra package you can create some nice visualizations like this:

library(factoextra)

set.seed(123)
df <- data.frame(rate = runif(20, 0, 1),
                 full.residuals = runif(20, 0, 1))

kmeans_cluster <- kmeans(scale(df), 2, nstart = 5)

kmeans_cluster$cluster
#>  [1] 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 2

fviz_cluster(kmeans_cluster, data = df,
             palette = c("#2E9FDF", "#00AFBB"), 
             geom = "point",
             ellipse.type = "convex", 
             ggtheme = theme_bw())

Created on 2022-08-18 with reprex v2.0.2

I would suggest to have a look at this link for some extra information about using this package.

  •  Tags:  
  • r
  • Related