Home > OS >  PCA select components graphically
PCA select components graphically

Time:06-22

I carry out a PCA for the data seta dataset data(decathlon) from the package FactoMineR like:

install.packages("FactoMineR")
library(FactoMineR)

install.packages("devtools")
library("devtools")
 
install_github("kassambara/factoextra")
library("factoextra")
 
install.packages("corrplot")
library("corrplot")
 
 
data("decathlon")
head( decathlon[c("Shot.put", "Shot.put", )])
  

options(ggrepel.max.overlaps = Inf)
res.pca <- PCA( decathlon[c("Shot.put", "Shot.put", )], scale.unit=TRUE, ncp=15, graph=TRUE)

and I get a PCA graph of variables.

How can I select an appropriate number of components graphically?

CodePudding user response:

It depends on you, but you may consider cumulative percentage of variance.

You may use factoextra::fviz_eig or

library(dplyr)
res.pca$eig %>%
  as.data.frame() %>%
  mutate(n = row_number()) %>%
  ggplot(aes(x = n, y = `cumulative percentage of variance`))   
  geom_col(fill = "steelblue")  
  geom_line()  
  theme_classic()  
  geom_hline(aes(yintercept = 90), lty = 2, color = "red")

Cutoff value 0.9(=90%) can be changed.

In this case select PC1 to PC4(or 5) that explains about 90% of variance of the data.

enter image description here

  •  Tags:  
  • r
  • Related