I have a dataset like this:
Region | Year | Month | rate | residuals |
---|---|---|---|---|
1 | 2010 | 1 | 0.5 | 0.5 |
2 | 2010 | 1 | 4.0 | 0.5 |
This dataset continues it has 15'000 observations.
I created a scatter plot :
plot(df$full.residuals, df$rate, main="Scatterplot",
xlab="rate", ylab="Residuals")
Now I can't do it further to create cluster in the plot? Does anyone know how to create clusters in the plot?
CodePudding user response:
First of all I created some more random datapoints, because with 2 points it will be hard to create clusters. You could use kmeans as an algorithm to create clusters. In this case I decide to create 2 clusters which you can change if you want. With the factoextra
package you can create some nice visualizations like this:
library(factoextra)
set.seed(123)
df <- data.frame(rate = runif(20, 0, 1),
full.residuals = runif(20, 0, 1))
kmeans_cluster <- kmeans(scale(df), 2, nstart = 5)
kmeans_cluster$cluster
#> [1] 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 2
fviz_cluster(kmeans_cluster, data = df,
palette = c("#2E9FDF", "#00AFBB"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw())
Created on 2022-08-18 with reprex v2.0.2
I would suggest to have a look at this link for some extra information about using this package.