Home > database >  R MASS::lda using cov.mve method - reproducability issues
R MASS::lda using cov.mve method - reproducability issues

Time:06-03

I am trying to model some data, using LDA, which is multivariate non-normal. I was hoping to get a more robust estimation, by choosing method = 'mve'. However this leads to variable predictions - minimal example supplied.

library(MASS)
library(caret)
set.seed(1)

data(iris)

acc <- list()
for (i in 1:100) {
    post_hoc <- lda(Species ~ Sepal.Length   Sepal.Width   Petal.Length   Petal.Width,
    data=iris , method = 'mve')
    conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
    acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
    }
hist(as.numeric(acc))

Looking at the lda.R code I see it does not set a seed for cov.rov function. How can I get a reproducible example?

CodePudding user response:

If you set.seed before lda, results will be identical, see and wonder:

f <- \() {
  acc <- list()
  for (i in 1:100) {
    set.seed(1)
    post_hoc <- lda(Species ~ Sepal.Length   Sepal.Width   Petal.Length   Petal.Width,
                    data=iris , method = 'mve')
    conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
    acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
  }
  acc
}

library(MASS); library(caret)

acc1 <- f()
all(sapply(acc1, all.equal, acc1[[1]]))
# [1] TRUE

CodePudding user response:

O.K., I've edited a version of lda.R with a set.seed() and the results are reproducible. This is strange.

  • Related