Why can't I do a 3 variable CFA with lavaan on R?-CodePudding

Everything works just fine when I have 4 variables or more. But I thought the minimum for a CFA or EFA was 3 variables. When I run the code, it gives me 0's for all the parameters that estimate if it's a good fit or not.

For this purpose, i will use the USArrests database.

library(lavaan)

d<- USArrests

abc <- 'abc =~  Murder   Assault   Rape'

fit <- lavaan::cfa(abc, data=d, missing = "FIML", estimator = "MLR")

summary(fit, fit.measures=TRUE)

This is where I'm running into an issue. If I add another variable to the abc model like UrbanPop, everything works and I will be able to get the CFI, RMSEA and so on. But sometimes I don't have a fourth one and still want to see if they're a good fit.

Results for 4 variables

Model Test User Model:
                                               Standard      Robust
  Comparative Fit Index (CFI)                    0.910       0.923
  RMSEA                                          0.281       0.221
  SRMR                                           0.073       0.073

And every time I use 3 variables regardless of the dataframe I get this :

Model Test Baseline Model:

  Comparative Fit Index (CFI)                    1.000       1.000
  RMSEA                                          0.000       0.000
  SRMR                                           0.000       0.000

Thank you!

CodePudding user response：

I thought the minimum for a CFA or EFA was 3 variables. When I run the code, it gives me 0's for all the parameters that estimate if it's a good fit

A 1-factor CFA with 3 indicators is just-identified (df = 0) when no errors correlate. If you have df = 0, then your model provides no opportunity for the data to falsify the model. The fit statistic is thus 0 because fit is (arbitrarily) perfect in a just-identified model. Introductory SEM / CFA textbooks discuss identification and fit in more detail, e.g.,

Brown, T. A. (2015). Confirmatory factor analysis for applied research. Guilford.

CodePudding user response：

Initial Check: Do Your Variables Even Correlate?

I actually don't think this is abnormal at all if you consider the variables you entered into the CFA. For the three factor CFA you used, it included variables related to crime. The four factor model uses a variable that's more loosely related: urban population. If you check the correlation matrix:

library(ggcorrplot)
library(correlation)
library(dplyr)

d.cor <- d %>% 
  correlation()

ggcorrplot(d.cor,
           lab = T,
           type = "lower")

You'll see the crime statistics are strongly correlated but the urban population variable is not:

Factor Loadings: Exhibiting Why

When you consequently look at the factor loadings of the output for a four factor model:

abc <- "
abc =~ Murder   Assault   Rape   UrbanPop
"

fit <- cfa(abc,
           data=d,
           missing = "FIML",
           estimator = "MLR")

summary(fit,
        fit.measures=T)

You will see why. The other variables load fine but the urban population variable is not significant:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  abc =~                                              
    Murder            1.000                           
    Assault          22.911    2.636    8.693    0.000
    Rape              1.796    0.344    5.227    0.000
    UrbanPop          1.064    0.660    1.613    0.107

This is why your metrics change. While the other three were clearly related to a global construct, your model is rightfully kicking back error in your estimation.