I want to create a function which first cuts a tree (from a hclust) into 2:13 groups (12 different cutree values), then calculates the adjusted rand index (randIndex) between these 12 cutree values and a stored vector I already have and finally store these adjusted rand index values into a vector so I can compare the answers. All I've got is
for(i in 2:13){
a <- cutree(hclust1, k=i)
randIndex(stored_vector, a)
}
where hclust1 is just the hierarchical clustering output and stored_vector is just the stored vector value I mentioned. I am completely new to programming and would appreciate some help. Thank you.
CodePudding user response:
Does this work for you?
library(tidyverse)
library(fossil) # rand.index function
# get a dataset for cutree, change this to your dataset
hc <- hclust(dist(USArrests))
# change k to your desired vector
k <- 2:12
vec <- cutree(hc, k = k)
# create an empty dataframe
df <- tibble(i=as.numeric(),j=as.numeric(),result=as.numeric())
# create nested for loops to get result
for (i in k) {
for (j in k) {
result <- rand.index(vec[,i-1],vec[,j-1])
df <- df %>%
add_row(i=i,j=j,result=result)
}
}
# view result
df %>%
filter(result != 1) %>%
distinct(result, .keep_all = TRUE) %>%
view()