I have a large dataset that I intend generating a sample of 10% from it to run my machine learning model 20 times. To test how it will work, I decided to use iris
dataset to try it. First, I split the dataset into training and testing dataset and then used a While loop
to try a simple loop but it doesn't seem to work as I got an error message. Please is there something I missed out?
### partitioning dataset
part <- sample(1:150, size = 100, replace = F)
training <- iris[part,]
testing <- iris[-part,]
## using a loop
n <-1
while (n<6) {
Train(n)<-training[sample(1:100,0.3*nrow(training), replace = F),]
fit <- randomForest(Species~., data = Train(n))
pred <- predict(fit, testing)
confusionMatrix(pred, testing$Species))
n <-n 1
}
The error message I got is
Error: unexpected '}' in "}"
CodePudding user response:
Here is the loop corrected and tested.
suppressPackageStartupMessages({
library(randomForest)
library(caret)
})
set.seed(2022)
part <- sample(1:150, size = 100, replace = FALSE)
training <- iris[part,]
testing <- iris[-part,]
## using a loop
result <- vector("list", 6L)
n <- 1L
while(n < 6L) {
Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
fit <- randomForest(Species ~ ., data = Train)
pred <- predict(fit, testing)
result[[n]] <- confusionMatrix(pred, testing$Species)
n <- n 1L
}
## see the first result
result[[1]]
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction setosa versicolor virginica
#> setosa 16 0 0
#> versicolor 0 11 1
#> virginica 0 3 19
#>
#> Overall Statistics
#>
#> Accuracy : 0.92
#> 95% CI : (0.8077, 0.9778)
#> No Information Rate : 0.4
#> P-Value [Acc > NIR] : 1.565e-14
#>
#> Kappa : 0.8778
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: setosa Class: versicolor Class: virginica
#> Sensitivity 1.00 0.7857 0.9500
#> Specificity 1.00 0.9722 0.9000
#> Pos Pred Value 1.00 0.9167 0.8636
#> Neg Pred Value 1.00 0.9211 0.9643
#> Prevalence 0.32 0.2800 0.4000
#> Detection Rate 0.32 0.2200 0.3800
#> Detection Prevalence 0.32 0.2400 0.4400
#> Balanced Accuracy 1.00 0.8790 0.9250
Created on 2022-05-11 by the reprex package (v2.0.1)
There's nothing to gain with a while
loop versus a for
loop, you are manually incrementing n
and that's what for
loops are meant for.
The equivalent for
loop is the following.
result <- vector("list", 6L)
for(n in 1:6) {
Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
fit <- randomForest(Species ~ ., data = Train)
pred <- predict(fit, testing)
result[[n]] <- confusionMatrix(pred, testing$Species)
}