Home > Software engineering >  I need help working out this question about similarity
I need help working out this question about similarity

Time:10-16

library(dplyr)
iris1 <- select(iris, -c(Species)) 
iris1 <- data.frame(iris1)
Q <- dist(iris1, method = "euclidian", diag = TRUE, upper = TRUE)
Q[50]
A <- as.matrix(Q)
A
Q`

Dissimilarity coefficient of iris data

Hi, I have been struggling to find the right codes to be able to answer this question. I have computed the dissimilarity coefficient and worked out that there are 150 distinct dissimilarities but can't seem to figure out this question. Please help :) This is the question: Find the flowers with most and least similarity with flower 50 on this data. Show your codes and output for identification, along with the statement. This includes my code and the first part of the dissimilarity coefficient

CodePudding user response:

Here is a base R solution.
Define a function to compute the pairwise distances between flower 50, the target flower, and the other flowers. Then apply a distance from a distances list. Finally, get the minima and maxima.

similarity <- function(method = "euclidian", x, y) {
  apply(x, 1, \(row) {
    stats::dist(rbind(row, y), method = method)
  })  
}

i_target_flower <- 50
target_flower <- unlist(iris[i_target_flower, -5, drop = TRUE])

dist_list <- c("euclidean", "maximum", "manhattan", "canberra", "minkowski")
sim <- lapply(dist_list, similarity, x = as.matrix(iris[-i_target_flower, -5]), y = target_flower)
names(sim) <- dist_list
sim_df <- as.data.frame(sim)

# which flowers in the original data set iris
sapply(sim_df, \(x) {
  i <- c(which.min(x), which.max(x))
  row.names(sim_df[i,])
})
#>      euclidean maximum manhattan canberra minkowski
#> [1,] "8"       "8"     "8"       "29"     "8"      
#> [2,] "119"     "119"   "119"     "119"    "119"

# corresponding distances
sapply(sim_df, \(x) x[c(which.min(x), which.max(x))])
#>      euclidean maximum manhattan   canberra minkowski
#> [1,] 0.1414214     0.1       0.2 0.03453322 0.1414214
#> [2,] 6.5145990     5.5      11.0 1.83389310 6.5145990

Created on 2022-10-16 with reprex v2.0.2

  •  Tags:  
  • r
  • Related