nonpca<-nsprcomp(data,ncomp=i,nneg=TRUE,scale.= TRUE)
names(nonpca)
SumPC<-rowSums(nonpca$rotation)
w<-(SumPC)/sum(SumPC)
my data is a csv file and is the same file, every time i run this though i get different PC values. ncomp = i is run through a for loop from 1:9
CodePudding user response:
If you check the help page for nsprcomp, it writes
This package implements two non-negative and/or sparse PCA algorithms which are rooted in expectation-maximization (EM) for a probabilistic generative model of PCA (Sigg and Buhmann, 2008). The nsprcomp algorithm can also be described as applying a soft-thresholding operator to the well-known power iteration method for computing eigenvalues.
If you are accustomed to calculating PCA using prcomp or princomp, these uses SVD or eignevalues of the covariance matrix, as touched in this post, hence it is deterministic and returns you the same values every time.
There are other methods of calculating PCs that relies on EM methods, you can also check this paper. The solutions are not deterministic but it's a nice tradeoff when your dataset is huge, as noted in the help page for nsprcomp:
The nsprcomp algorithm is suitable for large and high-dimensional data sets, because it entirely avoids computing the covariance matrix. It is therefore especially suited to the case where the number of features exceeds the number of observations.
You will see that most of your PCs are very close, with maybe the sign flipped. If you would like reproducible PCs, you can set the seed :
set.seed(111)
head(nsprcomp(mtcars)$x[,1:2])
PC1 PC2
Mazda RX4 -79.595868 -2.152978
Mazda RX4 Wag -79.598008 -2.168224
Datsun 710 -133.895409 5.022698
Hornet 4 Drive 8.528272 -44.983404
Hornet Sportabout 128.694362 -30.783879
Valiant -23.211004 -35.112577
But be sure to check the parameter nrestart
to make sure you are not hitting a local maxima :
nrestart: the number of random restarts for computing the principal
component via expectation-maximization (EM) iterations. The
solution achieving maximum standard deviation over all random
restarts is kept. A value greater than one can help to avoid
poor local maxima.