Everything works just fine when I have 4 variables or more. But I thought the minimum for a CFA or EFA was 3 variables. When I run the code, it gives me 0's for all the parameters that estimate if it's a good fit or not.
For this purpose, i will use the USArrests
database.
library(lavaan)
d<- USArrests
abc <- 'abc =~ Murder Assault Rape'
fit <- lavaan::cfa(abc, data=d, missing = "FIML", estimator = "MLR")
summary(fit, fit.measures=TRUE)
This is where I'm running into an issue. If I add another variable to the abc
model like UrbanPop
, everything works and I will be able to get the CFI, RMSEA and so on. But sometimes I don't have a fourth one and still want to see if they're a good fit.
Results for 4 variables
Model Test User Model:
Standard Robust
Comparative Fit Index (CFI) 0.910 0.923
RMSEA 0.281 0.221
SRMR 0.073 0.073
And every time I use 3 variables regardless of the dataframe I get this :
Model Test Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
RMSEA 0.000 0.000
SRMR 0.000 0.000
Thank you!
CodePudding user response:
I thought the minimum for a CFA or EFA was 3 variables. When I run the code, it gives me 0's for all the parameters that estimate if it's a good fit
A 1-factor CFA with 3 indicators is just-identified (df = 0) when no errors correlate. If you have df = 0, then your model provides no opportunity for the data to falsify the model. The fit statistic is thus 0 because fit is (arbitrarily) perfect in a just-identified model. Introductory SEM / CFA textbooks discuss identification and fit in more detail, e.g.,
Brown, T. A. (2015). Confirmatory factor analysis for applied research. Guilford.
CodePudding user response:
Initial Check: Do Your Variables Even Correlate?
I actually don't think this is abnormal at all if you consider the variables you entered into the CFA. For the three factor CFA you used, it included variables related to crime. The four factor model uses a variable that's more loosely related: urban population. If you check the correlation matrix:
library(ggcorrplot)
library(correlation)
library(dplyr)
d.cor <- d %>%
correlation()
ggcorrplot(d.cor,
lab = T,
type = "lower")
You'll see the crime statistics are strongly correlated but the urban population variable is not:
Factor Loadings: Exhibiting Why
When you consequently look at the factor loadings of the output for a four factor model:
abc <- "
abc =~ Murder Assault Rape UrbanPop
"
fit <- cfa(abc,
data=d,
missing = "FIML",
estimator = "MLR")
summary(fit,
fit.measures=T)
You will see why. The other variables load fine but the urban population variable is not significant:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
abc =~
Murder 1.000
Assault 22.911 2.636 8.693 0.000
Rape 1.796 0.344 5.227 0.000
UrbanPop 1.064 0.660 1.613 0.107
This is why your metrics change. While the other three were clearly related to a global construct, your model is rightfully kicking back error in your estimation.