library(dplyr)
iris1 <- select(iris, -c(Species))
iris1 <- data.frame(iris1)
Q <- dist(iris1, method = "euclidian", diag = TRUE, upper = TRUE)
Q[50]
A <- as.matrix(Q)
A
Q`
Dissimilarity coefficient of iris data
Hi, I have been struggling to find the right codes to be able to answer this question. I have computed the dissimilarity coefficient and worked out that there are 150 distinct dissimilarities but can't seem to figure out this question. Please help :) This is the question: Find the flowers with most and least similarity with flower 50 on this data. Show your codes and output for identification, along with the statement. This includes my code and the first part of the dissimilarity coefficient
CodePudding user response:
Here is a base R solution.
Define a function to compute the pairwise distances between flower 50, the target flower, and the other flowers. Then apply a distance from a distances list. Finally, get the minima and maxima.
similarity <- function(method = "euclidian", x, y) {
apply(x, 1, \(row) {
stats::dist(rbind(row, y), method = method)
})
}
i_target_flower <- 50
target_flower <- unlist(iris[i_target_flower, -5, drop = TRUE])
dist_list <- c("euclidean", "maximum", "manhattan", "canberra", "minkowski")
sim <- lapply(dist_list, similarity, x = as.matrix(iris[-i_target_flower, -5]), y = target_flower)
names(sim) <- dist_list
sim_df <- as.data.frame(sim)
# which flowers in the original data set iris
sapply(sim_df, \(x) {
i <- c(which.min(x), which.max(x))
row.names(sim_df[i,])
})
#> euclidean maximum manhattan canberra minkowski
#> [1,] "8" "8" "8" "29" "8"
#> [2,] "119" "119" "119" "119" "119"
# corresponding distances
sapply(sim_df, \(x) x[c(which.min(x), which.max(x))])
#> euclidean maximum manhattan canberra minkowski
#> [1,] 0.1414214 0.1 0.2 0.03453322 0.1414214
#> [2,] 6.5145990 5.5 11.0 1.83389310 6.5145990
Created on 2022-10-16 with reprex v2.0.2