I tried to draw the modified PCA plot representing two types of categories.
For example, I want to draw PCA with 1) PCA ellipse based on Kingdom, 2) PCA sample points colored with Class variable.
Here is the example dataset.
require(ggplot2)
require(ggfortify)
set.seed(1)
df <- structure(list(Sample = c("cat", "dog", "rabbit", "chicken", "duck", "butterfly", "ladybug", "rose", "lily", "iris", "maple tree", "pinetree", "ginkgo"),
Class = c("mammalia", "mammalia", "mammalia", "bird", "bird", "insect", "insect", "flower", "flower", "flower", "tree", "tree", "tree"),
Kingdom = c("animalia", "animalia", "animalia", "animalia", "animalia", "animalia", "animalia", "plantae", "plantae", "plantae", "plantae", "plantae", "plantae")),
class = "data.frame", row.names = c(NA, -13L))
rownames(df)<-df[,1]
df[,1]<-NULL
for(i in 3:20){
df[,i]<-sample(100, size=nrow(df), replace=TRUE)
}
df[,c(3:20)]<-log(df[,c(3:20)] 1, base=2)
df #I ommitted the value from v13 to v20 for simplicity
Class Kingdom V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
cat mammalia animalia 6.108524 5.781360 5.087463 5.357552 4.247928 5.614710 5.000000 5.129283 5.321928 6.303781
dog mammalia animalia 5.321928 6.228819 6.409391 5.491853 4.523562 6.491853 4.169925 3.459432 4.643856 4.523562
rabbit mammalia animalia 1.000000 3.000000 5.169925 4.700440 6.303781 4.584963 6.209453 1.000000 5.754888 6.149747
chicken bird animalia 5.129283 6.209453 6.149747 6.149747 6.044394 6.409391 6.459432 5.459432 6.539159 4.857981
duck bird animalia 6.459432 6.321928 6.228819 5.321928 6.149747 4.906891 6.392317 5.906891 6.442943 5.247928
butterfly insect animalia 5.459432 6.426265 5.426265 5.700440 6.459432 3.807355 6.507795 4.754888 5.357552 5.954196
ladybug insect animalia 3.906891 5.247928 5.285402 5.426265 6.149747 4.523562 5.614710 4.000000 6.614710 5.554589
rose flower plantae 6.375039 6.491853 4.392317 2.807355 6.247928 6.554589 6.022368 5.882643 6.392317 6.087463
lily flower plantae 5.906891 5.247928 4.857981 4.643856 6.357552 4.857981 6.569856 4.906891 6.507795 6.442943
iris flower plantae 5.700440 5.129283 4.392317 5.044394 6.658211 5.614710 6.599913 4.643856 5.169925 6.643856
maple tree tree plantae 6.614710 6.491853 5.491853 3.906891 3.807355 5.087463 5.930737 5.426265 5.459432 6.169925
pinetree tree plantae 6.426265 5.491853 6.459432 1.584963 5.357552 5.523562 5.700440 5.614710 1.000000 6.643856
ginkgo tree plantae 4.459432 6.321928 6.149747 5.523562 6.491853 4.459432 6.554589 6.266787 4.906891 5.700440
After instruct the dataset, I drew PCA plot using autoplot
pca_df<-prcomp(df[,-c(1,2)], scale.=TRUE)
autoplot(pca_df)
autoplot(pca_df, data=df, colour='Kingdom', frame.type='norm')
In this plot, I want to overlap the color according to the Class
pca_df<-prcomp(df[,-c(1,2)], scale.=TRUE)
autoplot(pca_df)
autoplot(pca_df, data=df, colour='Class')
In summary, I want to use the ellipse representation from Kingdom variable, and coloring with Class variable. How can I combine different representation in one PCA plot?
Additional trial
I tried drawing PCA plot using other package (factoextra), but it show same result.
require (factoextra)
fviz_pca_ind(pca_df,
col.ind=df$Kingdom,
addEllipses = TRUE,
col.ind.sup=df$Class,
repel=TRUE)
CodePudding user response:
You can specify the frame.colour
for the ellipses and the colour
for the points:
autoplot(pca_df, data=df, colour='Class', frame.type='norm', frame.colour = "Kingdom")