PCA analysis with triangular matrix-CodePudding

I was trying to using PCA to analysis my data. But it ends like this:

> head(MEGA)
# A tibble: 6 × 86
  ...1    A2S10A16T18 K3N10E14 Q3H6G8K14 G4L8D14 W2G16Q17C18 H15K16 E3V9D10W14
  <chr>         <dbl>    <dbl>     <dbl>   <dbl>       <dbl>  <dbl>      <dbl>
1 A2S10A…      NA       NA        NA      NA          NA         NA         NA
2 K3N10E…       0.462   NA        NA      NA          NA         NA         NA
3 Q3H6G8…       0.727    0.357    NA      NA          NA         NA         NA
4 G4L8D14       0.583    0.357     0.357  NA          NA         NA         NA
5 W2G16Q…       0.357    0.583     0.727   0.583      NA         NA         NA
6 H15K16        0.357    0.357     0.462   0.357       0.357     NA         NA

> prcomp(MEGA)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

Can you help me with this?

I am freshman in bioinformation, thank you so much.

CodePudding user response：

Here are the issues you need to solve to be able to compute your PCA (MDS):

There can't be any missing values, so you need to "mirror" your lower triangular matrix into the upper triangular
The table must be a matrix, with rownames and colnames, but only numerical variables.

Here's how I'd solve these problems in a reproducible example:

library(tidyverse)

#-- Making a fake reproducible example
dist_mat <- dist(matrix(rnorm(1000), ncol = 100), method = "euclidean") %>%
    as.matrix()
dist_mat[upper.tri(dist_mat)] <- NA
mega <- dist_mat %>%
    as.matrix() %>%
    as.data.frame() %>%
    rownames_to_column(var = "...1") %>%
    tibble()

#-- Creating the matrix
mega_mat <- mega %>%
    as.data.frame() %>%
    column_to_rownames("...1")

#-- Mirroring lower on upper triangular
mega_mat[upper.tri(mega_mat)] <- mega_mat[lower.tri(mega_mat)]

#-- Computing PCA
prcomp(mega_mat)
#> Standard deviations (1, .., p=10):
#>  [1] 5.758257e 00 5.621893e 00 5.289312e 00 5.089903e 00 4.739766e 00
#>  [6] 4.494360e 00 4.458136e 00 4.317035e 00 3.989503e 00 8.301307e-16
#> 
#> Rotation (n x k) = (10 x 10):
#>            PC1          PC2         PC3         PC4         PC5         PC6
#> 1   0.35741259 -0.005146882  0.39400567 -0.47952260  0.02795010  0.14613179
#> 2  -0.12992630 -0.199989467 -0.02870089 -0.63737579 -0.08742113 -0.29440106
#> 3   0.02392526 -0.478265115 -0.10158414  0.44137508  0.16895264 -0.13301094
#> 4   0.53242350 -0.208730464 -0.34645287  0.14446233 -0.41603543  0.05018439
#> 5  -0.45456984  0.282095797  0.16492701  0.17112439 -0.67212158  0.17606230
#> 6  -0.30035787 -0.300319956 -0.20283830 -0.06251480  0.09505941 -0.23999350
#> 7   0.22910453  0.474792604  0.30265124  0.26661252  0.15541762 -0.65105608
#> 8  -0.46965024 -0.101554817  0.08307490  0.01429205  0.22627747 -0.10163731
#> 9  -0.03485274  0.491621247 -0.54080984 -0.08671785  0.41166874  0.33337416
#> 10  0.01430624 -0.201168669  0.50425001  0.19008080  0.29040466  0.48767389
#>            PC7         PC8         PC9      PC10
#> 1   0.21547630 -0.11053044  0.54974532 0.3225596
#> 2  -0.38982043 -0.24769290 -0.41060997 0.2445915
#> 3  -0.24227704 -0.55651526  0.31106933 0.2327022
#> 4   0.25267026  0.15142687 -0.28808603 0.4320741
#> 5  -0.15625939 -0.10455639  0.15870825 0.3376700
#> 6  -0.12305942  0.69359210  0.36761203 0.2766879
#> 7  -0.09611608  0.07981677 -0.08463128 0.2976625
#> 8   0.73091422 -0.16044081 -0.22278636 0.3014421
#> 9  -0.13964194 -0.10677135  0.04031700 0.3794254
#> 10 -0.27664992  0.24140792 -0.36148755 0.2850979

^{Created on 2022-04-28 by the reprex package (v2.0.1)}

CodePudding user response：

@ Quinten The results after dput(MEGA) is like this: dput(MEGA)

0.583333333333333, 0.357142857142857, 0.583333333333333, 0.266666666666667, 0.583333333333333, 0.583333333333333, 0.461538461538462, 0.583333333333333, 0.583333333333333, 0.357142857142857, 0.727272727272727, 0.9, 0.727272727272727, 0.461538461538462, 0.9, 0.357142857142857, 0.583333333333333, 0.461538461538462, 0.461538461538462, 0.727272727272727, 0.583333333333333, 0.357142857142857, 0.461538461538462, 0.357142857142857, 0.461538461538462, 0.727272727272727, 0.583333333333333, 0.461538461538462, 0.461538461538462, 0.583333333333333, 0.461538461538462, 0.583333333333333, 0.727272727272727, 0.461538461538462, 0.357142857142857), L9L13G14L17 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.1875, 0.727272727272727, 0.583333333333333,

This is just part of the data. Thanks.

CodePudding user response：

Thank you so much. But when I run your code, it showed like this:

"Error in rownames_to_column(., var = "...1") : could not find function "rownames_to_column"

I guess my raw .xls data do not have rownames?

Here is my data:

enter image description here enter image description here