Home > Software engineering >  PCA with subsamples and then rebind to the data frame
PCA with subsamples and then rebind to the data frame

Time:10-14

I've done a pilot survey on a sample of 200. I asked for attitudes (4 items) toward Southeast Asians for half the sample, 100, and toward Vietnamese specifically for the other half.

I ran PCA using na.omit and then tried to rebind the first component to my data set. But I figured that the celled that are supposed to remain as NAs are filled with factor scores (100 factors were repeated twice).

How can I impute factor scores for only non-missing values?

seac <- princomp(scale(na.omit(pilot[, 96:99])))
summary(seac, loadings=TRUE, cutoff=0)
scree(cor(na.omit(pilot[, 96:99])), pc=TRUE, fa = FALSE)
data$seac <- seac$scores[,1]

CodePudding user response:

How about something like this:

inds <- which(apply(pilot[,96:99], 1, function(x)all(!is.na(x))))
seac <- princomp(scale(pilot[inds, 96:99]))
summary(seac, loadings=TRUE, cutoff=0)
scree(cor(pilot[inds, 96:99])), pc=TRUE, fa = FALSE)
data$seac <- NA
data$seac[inds] <- seac$scores[,1]

CodePudding user response:

You can use the predict method like so:

predictors <- scale(pilot[, 96:99])
predictors_without_NA <- na.omit(predictors)

seac <- princomp(predictors_without_NA)

cbind(pilot,
      predict(seac, newdata = predictors)
      )

Which will give you NAs for the scores of the NA-containing datasets though. You can impute missing values before running princomp by a variety of methods.

  • Related