Home > OS >  Why do i get different PC values when using the same data set when implementing nsprcomp
Why do i get different PC values when using the same data set when implementing nsprcomp

Time:11-28

  nonpca<-nsprcomp(data,ncomp=i,nneg=TRUE,scale.= TRUE)
  names(nonpca)
  SumPC<-rowSums(nonpca$rotation)
  w<-(SumPC)/sum(SumPC)

my data is a csv file and is the same file, every time i run this though i get different PC values. ncomp = i is run through a for loop from 1:9

CodePudding user response:

If you check the help page for nsprcomp, it writes

This package implements two non-negative and/or sparse PCA algorithms which are rooted in expectation-maximization (EM) for a probabilistic generative model of PCA (Sigg and Buhmann, 2008). The nsprcomp algorithm can also be described as applying a soft-thresholding operator to the well-known power iteration method for computing eigenvalues.

If you are accustomed to calculating PCA using prcomp or princomp, these uses SVD or eignevalues of the covariance matrix, as touched in this post, hence it is deterministic and returns you the same values every time.

There are other methods of calculating PCs that relies on EM methods, you can also check this paper. The solutions are not deterministic but it's a nice tradeoff when your dataset is huge, as noted in the help page for nsprcomp:

The nsprcomp algorithm is suitable for large and high-dimensional data sets, because it entirely avoids computing the covariance matrix. It is therefore especially suited to the case where the number of features exceeds the number of observations.

You will see that most of your PCs are very close, with maybe the sign flipped. If you would like reproducible PCs, you can set the seed :

set.seed(111)
head(nsprcomp(mtcars)$x[,1:2])
                          PC1        PC2
Mazda RX4          -79.595868  -2.152978
Mazda RX4 Wag      -79.598008  -2.168224
Datsun 710        -133.895409   5.022698
Hornet 4 Drive       8.528272 -44.983404
Hornet Sportabout  128.694362 -30.783879
Valiant            -23.211004 -35.112577

But be sure to check the parameter nrestart to make sure you are not hitting a local maxima :

nrestart: the number of random restarts for computing the principal
          component via expectation-maximization (EM) iterations. The
          solution achieving maximum standard deviation over all random
          restarts is kept. A value greater than one can help to avoid
          poor local maxima.
  •  Tags:  
  • r pca
  • Related