I am trying to model some data, using LDA, which is multivariate non-normal. I was hoping to get a more robust estimation, by choosing method = 'mve'. However this leads to variable predictions - minimal example supplied.
library(MASS)
library(caret)
set.seed(1)
data(iris)
acc <- list()
for (i in 1:100) {
post_hoc <- lda(Species ~ Sepal.Length Sepal.Width Petal.Length Petal.Width,
data=iris , method = 'mve')
conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
}
hist(as.numeric(acc))
Looking at the lda.R code I see it does not set a seed for cov.rov function. How can I get a reproducible example?
CodePudding user response:
If you set.seed
before lda
, results will be identical, see and wonder:
f <- \() {
acc <- list()
for (i in 1:100) {
set.seed(1)
post_hoc <- lda(Species ~ Sepal.Length Sepal.Width Petal.Length Petal.Width,
data=iris , method = 'mve')
conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
}
acc
}
library(MASS); library(caret)
acc1 <- f()
all(sapply(acc1, all.equal, acc1[[1]]))
# [1] TRUE
CodePudding user response:
O.K., I've edited a version of lda.R with a set.seed() and the results are reproducible. This is strange.