How does R know where to place the confidence ellipse for a PCA plot? I have a minimal code using the iris dataset:
library(factoextra)
a<-data.matrix(iris[-5])
b<-prcomp(a, scale. = TRUE, center = TRUE)
fviz_pca_ind(b,
col.ind = iris$Species,
addEllipses = TRUE)
I know that I can find the plot coordinate with b$x
. I also know that I can find the cluster centers with b$center
. How do I re-derive the ellipses from the data?
CodePudding user response:
If you are talking about the how,
CodePudding user response:
If you trace the code all the way through, you find that the ellipses are simply geom_polygons
created with stat = "ellipse"
, i.e. they are calculated by stat_ellipse
in ggplot.
We can show this by recreating the plot using only base R and ggplot
. The following is a fully reproducible example:
library(ggplot2)
b <- prcomp(iris[-5], scale. = TRUE, center = TRUE)
df <- as.data.frame(predict(b)[,1:2])
df$Species <- iris$Species
ggplot(df, aes(PC1, PC2, color = Species))
geom_point()
theme_bw()
geom_polygon(stat = "ellipse", aes(fill = Species), alpha = 0.3)
Ultimately, stat_ellipse
gets its data from the same method as cars::dataEllipse
, so if you want the raw co-ordinates of the ellipses, you can do:
e <- car::dataEllipse(df$PC1, df$PC2, df$Species)
and obtain the 95th centile normal data ellipse co-ordinates like this:
e$setosa$`0.95`
#> x y
#> [1,] -2.167825 2.06328716
#> [2,] -2.104642 2.04546589
#> [3,] -2.043166 1.99227221
#> [4,] -1.984331 1.90451250
#> [5,] -1.929028 1.78351710
#> [6,] -1.878095 1.63112017
#> [7,] -1.832305 1.44963190
#> [8,] -1.792351 1.24180347
#> [9,] -1.758839 1.01078534
#> [10,] -1.732278 0.76007952
#> [11,] -1.713069 0.49348644
#> [12,] -1.701504 0.21504739
#> [13,] -1.697759 -0.07101678
#> [14,] -1.701889 -0.36036963
#> [15,] -1.713833 -0.64862486
#> [16,] -1.733410 -0.93141283
#> [17,] -1.760322 -1.20444675
#> [18,] -1.794162 -1.46358770
#> [19,] -1.834417 -1.70490738
#> [20,] -1.880476 -1.92474763
#> [21,] -1.931641 -2.11977588
#> [22,] -1.987137 -2.28703571
#> [23,] -2.046123 -2.42399164
#> [24,] -2.107703 -2.52856754
#> [25,] -2.170946 -2.59917816
#> [26,] -2.234892 -2.63475311
#> [27,] -2.298571 -2.63475311
#> [28,] -2.361018 -2.59917816
#> [29,] -2.421288 -2.52856754
#> [30,] -2.478465 -2.42399164
#> [31,] -2.531684 -2.28703571
#> [32,] -2.580138 -2.11977588
#> [33,] -2.623091 -1.92474763
#> [34,] -2.659894 -1.70490738
#> [35,] -2.689988 -1.46358770
#> [36,] -2.712917 -1.20444675
#> [37,] -2.728333 -0.93141283
#> [38,] -2.736002 -0.64862486
#> [39,] -2.735809 -0.36036963
#> [40,] -2.727757 -0.07101678
#> [41,] -2.711966 0.21504739
#> [42,] -2.688678 0.49348644
#> [43,] -2.658244 0.76007952
#> [44,] -2.621126 1.01078534
#> [45,] -2.577888 1.24180347
#> [46,] -2.529183 1.44963190
#> [47,] -2.475751 1.63112017
#> [48,] -2.418401 1.78351710
#> [49,] -2.358004 1.90451250
#> [50,] -2.295473 1.99227221
#> [51,] -2.231758 2.04546589
#> [52,] -2.167825 2.06328716
Created on 2021-11-05 by the reprex package (v2.0.0)