"missing values in object" when using caret::train function-CodePudding

I ran into this error when I was trying to use the train function in {caret} package to do a 100-fold cv for a regression model. The codes I executed are as follows:

#read the dataset and convert columns to factors
data<-read.csv("synchronic_dataset_full.csv")
data<-as.data.frame(unclass(data), stringsAsFactors = TRUE)

#cross-validation using train() in {caret}
set.seed(527)
inTraining <- createDataPartition(data$realization, p = .75, list = FALSE)
training <- data [ inTraining,]
testing  <- data [-inTraining,]

fitControl <- trainControl(method = "cv",
                           number = 100)

regression_fit <- train(realization ~ (1|verb/VerbSense)   
                                      (1|Corpus)   
                                      Variety   
                                      Register  
                                      FollowVerb        
                                      z.WeightRatio   
                                      ThemeConcreteness  
                                      PrimeTypeCoarse  
                                      RecPron  
                                      z.RecThematicity  
                                      ThemeDef  
                                      z.RecHeadFrequency  
                                      RecHumaness  
                                      RecComplexity  
                                      ThemeComplexity  
                                      z.TTR  
                                      Variety*
                                      (RecComplexity  
                                      RecPron)  
                                      Register *
                                      ThemeConcreteness, 
                                      data = training, 
                                      method = "glm",
                                      metric = "Accuracy",
                                      trControl = fitControl)
regression_fit

And the error says:

Error in na.fail.default(list(realization = c(1L, 1L, 2L, 1L, 1L, 1L, : missing values in object

I checked the dataset and I am sure it contains no missing/NA values. I also attempted to solve the problem by adding an extra line of na.action=na.exclude after trControl=FitControl, and it doesn't help. The dataset can be accessed in this OSF page (note: please kindly delete it after using, as it's contains sensitive unpublished & un-peer-reviewed information).

CodePudding user response：

Just remove the parenthesis around (1|...) in the model formula. Another possibility is that | does not apply to factors, which is the case here I guess.

CodePudding user response：

I have managed to fix the issue and obtained relevant results using the following codes:

set.seed(527)
for (i in 1:100){
  Train <- createDataPartition(data$realization, p=0.75, list=FALSE)
  training <- data[Train, ]
  testing <- data[-Train, ]
  mod_fit <- glmer (realization ~ (1|verb/VerbSense)   
                                  (1|Corpus)   
                                  Variety   
                                  Register  
                                  FollowVerb        
                                  z.WeightRatio   
                                  ThemeConcreteness  
                                  PrimeTypeCoarse  
                                  RecPron  
                                  z.RecThematicity  
                                  ThemeDef  
                                  z.RecHeadFrequency  
                                  RecHumaness  
                                  RecComplexity  
                                  ThemeComplexity  
                                  z.TTR  
                                  Variety*
                                  (RecComplexity  
                                  RecPron)  
                                  Register *
                                  ThemeConcreteness, data=training, family="binomial")
  pred = predict(mod_fit, newdata=testing,allow.new.levels = TRUE)
  predictions.cat=ifelse(pred>0.5,"ThemeFirst", "RecipientFirst")
  predictions.cat=as.factor(predictions.cat)
  result=confusionMatrix(data=predictions.cat, testing$realization)
  print(result$overall[1])
}

As the problem has been solved, and considering that the dataset is still under construction, the material in the OSF page link has been removed.