Home > Software design >  How to overlapping the different information in the PCA plot?
How to overlapping the different information in the PCA plot?

Time:11-01

I tried to draw the modified PCA plot representing two types of categories.
For example, I want to draw PCA with 1) PCA ellipse based on Kingdom, 2) PCA sample points colored with Class variable.
Here is the example dataset.

require(ggplot2)
require(ggfortify)

set.seed(1)
df <- structure(list(Sample = c("cat", "dog", "rabbit", "chicken", "duck", "butterfly", "ladybug", "rose", "lily", "iris", "maple tree", "pinetree", "ginkgo"), 
                     Class = c("mammalia", "mammalia", "mammalia", "bird", "bird", "insect", "insect", "flower", "flower", "flower", "tree", "tree", "tree"), 
                     Kingdom = c("animalia", "animalia", "animalia", "animalia", "animalia", "animalia", "animalia", "plantae", "plantae", "plantae", "plantae", "plantae", "plantae")), 
                class = "data.frame", row.names = c(NA, -13L))
rownames(df)<-df[,1]
df[,1]<-NULL

for(i in 3:20){
  df[,i]<-sample(100, size=nrow(df), replace=TRUE)
}
df[,c(3:20)]<-log(df[,c(3:20)] 1, base=2)

df #I ommitted the value from v13 to v20 for simplicity
              Class  Kingdom       V3       V4       V5       V6       V7       V8       V9      V10      V11      V12
cat        mammalia animalia 6.108524 5.781360 5.087463 5.357552 4.247928 5.614710 5.000000 5.129283 5.321928 6.303781
dog        mammalia animalia 5.321928 6.228819 6.409391 5.491853 4.523562 6.491853 4.169925 3.459432 4.643856 4.523562
rabbit     mammalia animalia 1.000000 3.000000 5.169925 4.700440 6.303781 4.584963 6.209453 1.000000 5.754888 6.149747
chicken        bird animalia 5.129283 6.209453 6.149747 6.149747 6.044394 6.409391 6.459432 5.459432 6.539159 4.857981
duck           bird animalia 6.459432 6.321928 6.228819 5.321928 6.149747 4.906891 6.392317 5.906891 6.442943 5.247928
butterfly    insect animalia 5.459432 6.426265 5.426265 5.700440 6.459432 3.807355 6.507795 4.754888 5.357552 5.954196
ladybug      insect animalia 3.906891 5.247928 5.285402 5.426265 6.149747 4.523562 5.614710 4.000000 6.614710 5.554589
rose         flower  plantae 6.375039 6.491853 4.392317 2.807355 6.247928 6.554589 6.022368 5.882643 6.392317 6.087463
lily         flower  plantae 5.906891 5.247928 4.857981 4.643856 6.357552 4.857981 6.569856 4.906891 6.507795 6.442943
iris         flower  plantae 5.700440 5.129283 4.392317 5.044394 6.658211 5.614710 6.599913 4.643856 5.169925 6.643856
maple tree     tree  plantae 6.614710 6.491853 5.491853 3.906891 3.807355 5.087463 5.930737 5.426265 5.459432 6.169925
pinetree       tree  plantae 6.426265 5.491853 6.459432 1.584963 5.357552 5.523562 5.700440 5.614710 1.000000 6.643856
ginkgo         tree  plantae 4.459432 6.321928 6.149747 5.523562 6.491853 4.459432 6.554589 6.266787 4.906891 5.700440

After instruct the dataset, I drew PCA plot using autoplot

pca_df<-prcomp(df[,-c(1,2)], scale.=TRUE)
autoplot(pca_df)
autoplot(pca_df, data=df, colour='Kingdom', frame.type='norm')

enter image description here

In this plot, I want to overlap the color according to the Class

pca_df<-prcomp(df[,-c(1,2)], scale.=TRUE)
autoplot(pca_df)
autoplot(pca_df, data=df, colour='Class')

enter image description here

In summary, I want to use the ellipse representation from Kingdom variable, and coloring with Class variable. How can I combine different representation in one PCA plot?


Additional trial
I tried drawing PCA plot using other package (factoextra), but it show same result.

require (factoextra)

fviz_pca_ind(pca_df,
             col.ind=df$Kingdom,
             addEllipses = TRUE,
             col.ind.sup=df$Class,
             repel=TRUE)

enter image description here

CodePudding user response:

You can specify the frame.colour for the ellipses and the colour for the points:

autoplot(pca_df, data=df, colour='Class', frame.type='norm', frame.colour = "Kingdom")

enter image description here

  • Related