Home > front end >  How does R calculate the PCA ellipses?
How does R calculate the PCA ellipses?

Time:11-06

How does R know where to place the confidence ellipse for a PCA plot? I have a minimal code using the iris dataset:

library(factoextra)
a<-data.matrix(iris[-5])
b<-prcomp(a, scale. = TRUE, center = TRUE)
fviz_pca_ind(b,
             col.ind = iris$Species,
             addEllipses = TRUE)

I know that I can find the plot coordinate with b$x. I also know that I can find the cluster centers with b$center. How do I re-derive the ellipses from the data?

CodePudding user response:

If you are talking about the how, enter image description here

CodePudding user response:

If you trace the code all the way through, you find that the ellipses are simply geom_polygons created with stat = "ellipse", i.e. they are calculated by stat_ellipse in ggplot.

We can show this by recreating the plot using only base R and ggplot. The following is a fully reproducible example:

library(ggplot2)

b <- prcomp(iris[-5], scale. = TRUE, center = TRUE)
df <- as.data.frame(predict(b)[,1:2])
df$Species <- iris$Species


ggplot(df, aes(PC1, PC2, color = Species))   
  geom_point()  
  theme_bw()  
  geom_polygon(stat = "ellipse", aes(fill = Species), alpha = 0.3)

Ultimately, stat_ellipse gets its data from the same method as cars::dataEllipse, so if you want the raw co-ordinates of the ellipses, you can do:

e <- car::dataEllipse(df$PC1, df$PC2, df$Species)

and obtain the 95th centile normal data ellipse co-ordinates like this:

e$setosa$`0.95`
#>               x           y
#>  [1,] -2.167825  2.06328716
#>  [2,] -2.104642  2.04546589
#>  [3,] -2.043166  1.99227221
#>  [4,] -1.984331  1.90451250
#>  [5,] -1.929028  1.78351710
#>  [6,] -1.878095  1.63112017
#>  [7,] -1.832305  1.44963190
#>  [8,] -1.792351  1.24180347
#>  [9,] -1.758839  1.01078534
#> [10,] -1.732278  0.76007952
#> [11,] -1.713069  0.49348644
#> [12,] -1.701504  0.21504739
#> [13,] -1.697759 -0.07101678
#> [14,] -1.701889 -0.36036963
#> [15,] -1.713833 -0.64862486
#> [16,] -1.733410 -0.93141283
#> [17,] -1.760322 -1.20444675
#> [18,] -1.794162 -1.46358770
#> [19,] -1.834417 -1.70490738
#> [20,] -1.880476 -1.92474763
#> [21,] -1.931641 -2.11977588
#> [22,] -1.987137 -2.28703571
#> [23,] -2.046123 -2.42399164
#> [24,] -2.107703 -2.52856754
#> [25,] -2.170946 -2.59917816
#> [26,] -2.234892 -2.63475311
#> [27,] -2.298571 -2.63475311
#> [28,] -2.361018 -2.59917816
#> [29,] -2.421288 -2.52856754
#> [30,] -2.478465 -2.42399164
#> [31,] -2.531684 -2.28703571
#> [32,] -2.580138 -2.11977588
#> [33,] -2.623091 -1.92474763
#> [34,] -2.659894 -1.70490738
#> [35,] -2.689988 -1.46358770
#> [36,] -2.712917 -1.20444675
#> [37,] -2.728333 -0.93141283
#> [38,] -2.736002 -0.64862486
#> [39,] -2.735809 -0.36036963
#> [40,] -2.727757 -0.07101678
#> [41,] -2.711966  0.21504739
#> [42,] -2.688678  0.49348644
#> [43,] -2.658244  0.76007952
#> [44,] -2.621126  1.01078534
#> [45,] -2.577888  1.24180347
#> [46,] -2.529183  1.44963190
#> [47,] -2.475751  1.63112017
#> [48,] -2.418401  1.78351710
#> [49,] -2.358004  1.90451250
#> [50,] -2.295473  1.99227221
#> [51,] -2.231758  2.04546589
#> [52,] -2.167825  2.06328716

Created on 2021-11-05 by the reprex package (v2.0.0)

  •  Tags:  
  • r pca
  • Related