Matrix calculations within an R function-CodePudding

I am trying to code a function which will identify which row of an nxm matrix M is closest to a vector y of length m.

What am I doing wrong in my code please? I am aiming for the function to produce a column vector of length n which gives the distance between each row coordinates of the matrix and the vector y. I then want to output the row number of the Matrix for which is the closest point to the vector.

closest.point <- function(M, y) {
  p <- length(y)
  k <- nrow(M)
  T <- matrix(nrow=k)
  T <- for(i in 1:n) 
    for(j in 1:m) {
      (X[i,j] - x[j])^2   (X[i,j] - x[j])^2
    }
  W <- rowSums(T)
  max(W)
  df[which.max(W),]
}

CodePudding user response：

Even though there is already a better approach (not using for loops when dealing with matrices) to the problem, I would like to give you a solution to your approach with a for loop.

There were some mistakes in your function. There are some undefined variables like n, m or X.

Also try to avoid to name variables as T, because R interprets T as TRUE. It works but could result in some errors if one uses T as TRUE in the following code lines.

When looping, you need to give an index to your variable that you are updating, like T.matrix[i, j] and not only T.matrix as this will overwrite T.matrix at every iteration.

closest.point <- function(M, y) {
  k <- nrow(M)
  m <- ncol(M)
  T.matrix <- matrix(nrow = k, ncol = m)

  for (i in 1:k) {
    for (j in 1:m) {
      T.matrix[i, j] <- (M[i,j] - y[j])^2   (M[i,j] - y[j])^2
    }
  }
  W <- rowSums(T.matrix)
  return(which.min(W))
}

# example 1
closest.point(M = rbind(c(1, 1, 1), 
                        c(1, 2, 5)), 
              y = cbind(c(1, 2, 5)))
# [1] 2

# example 2
closest.point(M = rbind(c(1, 1, 1, 1), 
                        c(1, 2, 5, 7)), 
              y = cbind(c(2, 2, 6, 2)))
# [1] 2

CodePudding user response：

You should try to avoid using for loop to do operations on vectors and matrices. The dist base function calculates distances. Then which.min will give you the index of the minimal distance.

set.seed(0)
M <- matrix(rnorm(100), ncol = 5)
y <- rnorm(5)

closest_point <- function(M, y) {
    dist_mat <- as.matrix(dist(rbind(M, y)))
    all_distances <- dist_mat[1:nrow(M),ncol(dist_mat)]
    which.min(all_distances)
}

closest_point(M, y)
#>    
#> 14

^{Created on 2021-12-10 by the reprex package (v2.0.1)}

Hope this makes sense, let me know if you have questions.

CodePudding user response：

There are a number of problems here

p is defined but never used.
Although not wrong T does not really have to be a matrix. It would be sufficient to have it be a vector.
Although not wrong using T as a variable is dangerous because T also means TRUE.
The code defines T and them immediately throws it away in the next statement overwriting it. The prior statement defining T is never used.
for always has the value of NULL so assigning it to T is pointless.
the double for loop doesn't do anything. There are no assignments in it so the loops have no effect.
the loops refer to m, n, X and x but these are nowhere defined.
(X[i,j] - x[j])^2 is repeated. It is only needed once.
Writing max(W) on a line by itself has no effect. It only causes printing to be done if done directly in the console. If done in a function it has no effect. If you meant to print it then write print(max(W)).
We want the closest point, not the farthest point, so max should be min.
df is used in the last line but is not defined anywhere.
The question is incomplete without a test run.

I have tried to make the minimum changes to make this work:

closest.point <- function(M, y) {
  nr <- nrow(M)
  nc <- ncol(M)
  W <- numeric(nr)  # vector having nr zeros
  for(i in 1:nr) {
    for(j in 1:nc) {
      W[i] <- W[i]   (M[i,j] - y[j])^2
    }
   }
  print(W)
  print(min(W))
  M[which.min(W),]
}

set.seed(123)
M <- matrix(rnorm(12), 4); M
##             [,1]       [,2]       [,3]
## [1,] -0.56047565  0.1292877 -0.6868529
## [2,] -0.23017749  1.7150650 -0.4456620
## [3,]  1.55870831  0.4609162  1.2240818
## [4,]  0.07050839 -1.2650612  0.3598138

y <- rnorm(3); y
## [1]  0.4007715  0.1106827 -0.5558411

closest.point(M, y)
## [1] 0.9415062 2.9842785 4.6316069 2.8401691  <--- W
## [1] 0.9415062    <--- min(W)
## [1] -0.5604756  0.1292877 -0.6868529  <-- closest row

That said the calculation of the closest row can be done in this function with a one-line body. We transpose M and then subtract y from it which will subtract y from each column but the columns of the transpose are the rows of M so this subtracts y from each row. Then take the column sums of the squared differences and find which one is least. Subscript M using that.

closest.point2 <- function(M, y) { 
  M[which.min(colSums((t(M) - y)^2)), ]
}

closest.point2(M, y)
## [1] -0.5604756  0.1292877 -0.6868529  <-- closest row