I was trying to using PCA to analysis my data. But it ends like this:
> head(MEGA)
# A tibble: 6 × 86
...1 A2S10A16T18 K3N10E14 Q3H6G8K14 G4L8D14 W2G16Q17C18 H15K16 E3V9D10W14
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A2S10A… NA NA NA NA NA NA NA
2 K3N10E… 0.462 NA NA NA NA NA NA
3 Q3H6G8… 0.727 0.357 NA NA NA NA NA
4 G4L8D14 0.583 0.357 0.357 NA NA NA NA
5 W2G16Q… 0.357 0.583 0.727 0.583 NA NA NA
6 H15K16 0.357 0.357 0.462 0.357 0.357 NA NA
> prcomp(MEGA)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Can you help me with this?
I am freshman in bioinformation, thank you so much.
CodePudding user response:
Here are the issues you need to solve to be able to compute your PCA (MDS):
- There can't be any missing values, so you need to "mirror" your lower triangular matrix into the upper triangular
- The table must be a matrix, with
rownames
andcolnames
, but only numerical variables.
Here's how I'd solve these problems in a reproducible example:
library(tidyverse)
#-- Making a fake reproducible example
dist_mat <- dist(matrix(rnorm(1000), ncol = 100), method = "euclidean") %>%
as.matrix()
dist_mat[upper.tri(dist_mat)] <- NA
mega <- dist_mat %>%
as.matrix() %>%
as.data.frame() %>%
rownames_to_column(var = "...1") %>%
tibble()
#-- Creating the matrix
mega_mat <- mega %>%
as.data.frame() %>%
column_to_rownames("...1")
#-- Mirroring lower on upper triangular
mega_mat[upper.tri(mega_mat)] <- mega_mat[lower.tri(mega_mat)]
#-- Computing PCA
prcomp(mega_mat)
#> Standard deviations (1, .., p=10):
#> [1] 5.758257e 00 5.621893e 00 5.289312e 00 5.089903e 00 4.739766e 00
#> [6] 4.494360e 00 4.458136e 00 4.317035e 00 3.989503e 00 8.301307e-16
#>
#> Rotation (n x k) = (10 x 10):
#> PC1 PC2 PC3 PC4 PC5 PC6
#> 1 0.35741259 -0.005146882 0.39400567 -0.47952260 0.02795010 0.14613179
#> 2 -0.12992630 -0.199989467 -0.02870089 -0.63737579 -0.08742113 -0.29440106
#> 3 0.02392526 -0.478265115 -0.10158414 0.44137508 0.16895264 -0.13301094
#> 4 0.53242350 -0.208730464 -0.34645287 0.14446233 -0.41603543 0.05018439
#> 5 -0.45456984 0.282095797 0.16492701 0.17112439 -0.67212158 0.17606230
#> 6 -0.30035787 -0.300319956 -0.20283830 -0.06251480 0.09505941 -0.23999350
#> 7 0.22910453 0.474792604 0.30265124 0.26661252 0.15541762 -0.65105608
#> 8 -0.46965024 -0.101554817 0.08307490 0.01429205 0.22627747 -0.10163731
#> 9 -0.03485274 0.491621247 -0.54080984 -0.08671785 0.41166874 0.33337416
#> 10 0.01430624 -0.201168669 0.50425001 0.19008080 0.29040466 0.48767389
#> PC7 PC8 PC9 PC10
#> 1 0.21547630 -0.11053044 0.54974532 0.3225596
#> 2 -0.38982043 -0.24769290 -0.41060997 0.2445915
#> 3 -0.24227704 -0.55651526 0.31106933 0.2327022
#> 4 0.25267026 0.15142687 -0.28808603 0.4320741
#> 5 -0.15625939 -0.10455639 0.15870825 0.3376700
#> 6 -0.12305942 0.69359210 0.36761203 0.2766879
#> 7 -0.09611608 0.07981677 -0.08463128 0.2976625
#> 8 0.73091422 -0.16044081 -0.22278636 0.3014421
#> 9 -0.13964194 -0.10677135 0.04031700 0.3794254
#> 10 -0.27664992 0.24140792 -0.36148755 0.2850979
Created on 2022-04-28 by the reprex package (v2.0.1)
CodePudding user response:
@ Quinten The results after dput(MEGA) is like this: dput(MEGA)
0.583333333333333, 0.357142857142857, 0.583333333333333, 0.266666666666667, 0.583333333333333, 0.583333333333333, 0.461538461538462, 0.583333333333333, 0.583333333333333, 0.357142857142857, 0.727272727272727, 0.9, 0.727272727272727, 0.461538461538462, 0.9, 0.357142857142857, 0.583333333333333, 0.461538461538462, 0.461538461538462, 0.727272727272727, 0.583333333333333, 0.357142857142857, 0.461538461538462, 0.357142857142857, 0.461538461538462, 0.727272727272727, 0.583333333333333, 0.461538461538462, 0.461538461538462, 0.583333333333333, 0.461538461538462, 0.583333333333333, 0.727272727272727, 0.461538461538462, 0.357142857142857), L9L13G14L17 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.1875, 0.727272727272727, 0.583333333333333,
This is just part of the data. Thanks.
CodePudding user response:
Thank you so much. But when I run your code, it showed like this:
"Error in rownames_to_column(., var = "...1") : could not find function "rownames_to_column"
I guess my raw .xls data do not have rownames?
Here is my data: