I ran into this error when I was trying to use the train function in {caret} package to do a 100-fold cv for a regression model. The codes I executed are as follows:
#read the dataset and convert columns to factors
data<-read.csv("synchronic_dataset_full.csv")
data<-as.data.frame(unclass(data), stringsAsFactors = TRUE)
#cross-validation using train() in {caret}
set.seed(527)
inTraining <- createDataPartition(data$realization, p = .75, list = FALSE)
training <- data [ inTraining,]
testing <- data [-inTraining,]
fitControl <- trainControl(method = "cv",
number = 100)
regression_fit <- train(realization ~ (1|verb/VerbSense)
(1|Corpus)
Variety
Register
FollowVerb
z.WeightRatio
ThemeConcreteness
PrimeTypeCoarse
RecPron
z.RecThematicity
ThemeDef
z.RecHeadFrequency
RecHumaness
RecComplexity
ThemeComplexity
z.TTR
Variety*
(RecComplexity
RecPron)
Register *
ThemeConcreteness,
data = training,
method = "glm",
metric = "Accuracy",
trControl = fitControl)
regression_fit
And the error says:
Error in na.fail.default(list(realization = c(1L, 1L, 2L, 1L, 1L, 1L, : missing values in object
I checked the dataset and I am sure it contains no missing/NA values. I also attempted to solve the problem by adding an extra line of na.action=na.exclude
after trControl=FitControl
, and it doesn't help. The dataset can be accessed in this OSF page (note: please kindly delete it after using, as it's contains sensitive unpublished & un-peer-reviewed information).
CodePudding user response:
Just remove the parenthesis around (1|...)
in the model formula.
Another possibility is that |
does not apply to factors, which is the case here I guess.
CodePudding user response:
I have managed to fix the issue and obtained relevant results using the following codes:
set.seed(527)
for (i in 1:100){
Train <- createDataPartition(data$realization, p=0.75, list=FALSE)
training <- data[Train, ]
testing <- data[-Train, ]
mod_fit <- glmer (realization ~ (1|verb/VerbSense)
(1|Corpus)
Variety
Register
FollowVerb
z.WeightRatio
ThemeConcreteness
PrimeTypeCoarse
RecPron
z.RecThematicity
ThemeDef
z.RecHeadFrequency
RecHumaness
RecComplexity
ThemeComplexity
z.TTR
Variety*
(RecComplexity
RecPron)
Register *
ThemeConcreteness, data=training, family="binomial")
pred = predict(mod_fit, newdata=testing,allow.new.levels = TRUE)
predictions.cat=ifelse(pred>0.5,"ThemeFirst", "RecipientFirst")
predictions.cat=as.factor(predictions.cat)
result=confusionMatrix(data=predictions.cat, testing$realization)
print(result$overall[1])
}
As the problem has been solved, and considering that the dataset is still under construction, the material in the OSF page link has been removed.