I've done a pilot survey on a sample of 200. I asked for attitudes (4 items) toward Southeast Asians for half the sample, 100, and toward Vietnamese specifically for the other half.
I ran PCA using na.omit and then tried to rebind the first component to my data set. But I figured that the celled that are supposed to remain as NAs are filled with factor scores (100 factors were repeated twice).
How can I impute factor scores for only non-missing values?
seac <- princomp(scale(na.omit(pilot[, 96:99])))
summary(seac, loadings=TRUE, cutoff=0)
scree(cor(na.omit(pilot[, 96:99])), pc=TRUE, fa = FALSE)
data$seac <- seac$scores[,1]
CodePudding user response:
How about something like this:
inds <- which(apply(pilot[,96:99], 1, function(x)all(!is.na(x))))
seac <- princomp(scale(pilot[inds, 96:99]))
summary(seac, loadings=TRUE, cutoff=0)
scree(cor(pilot[inds, 96:99])), pc=TRUE, fa = FALSE)
data$seac <- NA
data$seac[inds] <- seac$scores[,1]
CodePudding user response:
You can use the predict
method like so:
predictors <- scale(pilot[, 96:99])
predictors_without_NA <- na.omit(predictors)
seac <- princomp(predictors_without_NA)
cbind(pilot,
predict(seac, newdata = predictors)
)
Which will give you NAs for the scores of the NA-containing datasets though. You can impute missing values before running princomp
by a variety of methods.