I fit three different kemans
models to the iris data set. Then, I would like to compare their
rand index (RI) using the for
loop. If the RI of a model is larger than the second one, then stop the loop and return the largest values of RI.
For example, if the RI
of the first model is larger than the RI
of the second model, then
break the for
loop and provide the RI
of the first model. In my example, the RI
of the second model is larger than the third model. Hence, the for
loop should break and provide me with the value of the second model fit2$cluster
. When I run the model it returns me the number 1, which is the first model and it is not correct. If there is a way to return the name of the model would be even better. Any help, please?
Here is my try:
library(aricode)## contain the RI function
fit1 <- kmeans(iris[,-5], centers = 2)
fit2 <- kmeans(iris[,-5], centers = 3)
fit3 <- kmeans(iris[,-5], centers = 4)
fit <- list(fit1$cluster, fit2$cluster, fit3$cluster)
Here is my for loop
for(i in seq_along(fit)){
if (RI(fit[[i]], iris[,5]) > RI(fit[[i 1]], iris[,5])) break
# x <- RI(fit[[i]], iris[,5])
print(i)
}
CodePudding user response:
Not sure why you want to print
, you can only read it but can't do anything with it. Also break
isn't needed, since you want to run the loop to the end.
Here we use model 1 as starting value and update it in every iteration.
w <- 1L
for (i in seq_along(fit)[-1L]) {
if (RI(fit[[i]], iris[, 5]) > RI(fit[[i - 1]], iris[, 5])) {
w <- i
}
}
w
# [1] 2
RI(fit[[w]], iris[, 5])
# [1] 0.8797315
Alternatively, it would be much easier if RI()
was Vectorized
, so let's do it!
RIv <- Vectorize(RI, vectorize.args='c1')
RIv(fit, iris[, 5])
# [1] 0.7636689 0.8797315 0.8295302
To learn, which model has the larges value, we use which.max
,
RIv(fit, iris[, 5]) |> which.max()
# [1] 2
to simply get the largest value, we pipe it into max
,
RIv(fit, iris[, 5]) |> max()
# [1] 0.8797315
or all together:
RIv(fit, iris[, 5]) |> {\(.) {w=which.max(.); data.frame(model=w, value=.[w])}}()
# model value
# 1 2 0.8797315
CodePudding user response:
It's only printing 1, exactly because RI
of your second model is larger than the third model and for that, if condition satisfies in the 2nd iteration and the loop breaks before printing 2, therefore, you have only 1 printed, instead try this
for(i in seq_along(fit)){
if (RI(fit[[i]], iris[,5]) > RI(fit[[i 1]], iris[,5])) {
print(i)
break
}
}