I carry out a PCA for the data seta dataset data(decathlon)
from the package FactoMineR
like:
install.packages("FactoMineR")
library(FactoMineR)
install.packages("devtools")
library("devtools")
install_github("kassambara/factoextra")
library("factoextra")
install.packages("corrplot")
library("corrplot")
data("decathlon")
head( decathlon[c("Shot.put", "Shot.put", )])
options(ggrepel.max.overlaps = Inf)
res.pca <- PCA( decathlon[c("Shot.put", "Shot.put", )], scale.unit=TRUE, ncp=15, graph=TRUE)
and I get a PCA graph of variables.
How can I select an appropriate number of components graphically?
CodePudding user response:
It depends on you, but you may consider cumulative percentage of variance.
You may use factoextra::fviz_eig
or
library(dplyr)
res.pca$eig %>%
as.data.frame() %>%
mutate(n = row_number()) %>%
ggplot(aes(x = n, y = `cumulative percentage of variance`))
geom_col(fill = "steelblue")
geom_line()
theme_classic()
geom_hline(aes(yintercept = 90), lty = 2, color = "red")
Cutoff value 0.9(=90%) can be changed.
In this case select PC1 to PC4(or 5) that explains about 90% of variance of the data.