Identifying Smallest Element in Each Row of a Matrix-CodePudding

I have this "cost matrix" in R that represents the "cost" of going from any location to any other location (for a total of 5 locations):

X<-matrix(rnorm(25) , nrow = 5)

rownames(X) <- colnames(X) <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")

               Location 1  Location 2  Location 3 Location 4 Location 5
Location 1  0.4501251  2.30029903 -0.26950735  0.1723589  0.5045694
Location 2  1.1208198  1.38557818  0.25250596 -0.6174514 -0.5324785
Location 3  0.4181011  0.01103208  0.83713132 -0.7649082 -0.5619196
Location 4  0.9372365 -1.04258420  0.08397031  0.1611555  1.8356483
Location 5  1.0201278 -0.56020913  1.14815210  1.0362332 -2.2052776

I would like to find out the "Greedy Shortest Path" that starts from "Location 1" and ends at "Location 1" while visiting each location exactly once.

I think this would look something like this (R getting the minimum value for each row in a matrix, and returning the row and column name) - this code returns the smallest value in each row of the matrix:

result <- t(sapply(seq(nrow(X)), function(i) {
  j <- which.min(X[i,])
  c(paste(rownames(X)[i], colnames(X)[j], sep='/'), X[i,j])
}))

When I look at the results:

print(result)


     [,1]                    [,2]                
[1,] "Location 1/Location 3" "-0.269507349140081"
[2,] "Location 2/Location 4" "-0.617451368699149"
[3,] "Location 3/Location 4" "-0.764908186347014"
[4,] "Location 4/Location 2" "-1.04258420123991" 
[5,] "Location 5/Location 5" "-2.20527763537575"

I think this is telling me that the "Greedy Shortest Path" (starting from "Location 1") is : 1 to 3, 3 to 4, 4 to 2, 2 to 4 ... but then I get stuck in a "2 to 4, 4 to 2" loop for ever.

Can someone please show me how I can find the "Greedy Shortest Path" that starts from "Location 1"?

By doing this manually:

Starting at Location 1, the "shortest greedy path" is to Location 4
From Location 4, the "shortest greedy path" is to Location 3
From Location 3, the "shortest greedy path" is to Location 5
From Location 5, the "shortest greedy path" is to Location 2 (since we have already been to Location 3 and Location 4, and we can not re-visit the current Location i.e. Location 5, and can not visit Location 1 since there is still a Location we haven't visited)
From Location 2, we now have no choice but to return to Location 1 and finish the journey

I would look to produce the following output:

Path : (1,4), (4,3), (3,5), (5,2), (2,1)
Total Distance = -0.8441315   (-0.7244259)   (-0.3775706)   0.3796208   0.3015059 =  -1.265001

Could someone please show me how to modify my code to get this final output?

Thank you!

CodePudding user response：

This keeps track of visited locations and doesn't check them:

set.seed(123)

n <- 5L
X <- matrix(rnorm(n^2), nrow = n)

rownames(X) <- colnames(X) <- paste("Location", 1:n)

shortest_path <- function(x, start = 1L) {
  n <- nrow(x)
  nn <- 1:n
  used <- c(start, integer(n - 1L))

  for (step in 2:n) {
    used[step] <- nn[-used][which.min(x[used[step - 1L], -used])]
  }
  
  data.frame(path = colnames(x)[used], dist = c(0, x[used[1:(n - 1L)]   n*(used[2:n] - 1L)]))
}

df <- shortest_path(X)
X
#>             Location 1 Location 2 Location 3 Location 4 Location 5
#> Location 1 -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237
#> Location 2 -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749
#> Location 3  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044
#> Location 4  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912
#> Location 5  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
df
#>         path       dist
#> 1 Location 1  0.0000000
#> 2 Location 5 -1.0678237
#> 3 Location 3 -0.5558411
#> 4 Location 4 -1.9666172
#> 5 Location 2 -0.6868529

CodePudding user response：

This seems to be a typical Traveling Salesman Problem (TSP), and I believe you can find a bunch of implementation methods.

Here is a base R option by defining a recursive function like below (borrow data from @jblood94's answer)

f <- function(i, S = setdiff(1:ncol(X), i), path = i) {
    if (length(S) == 1) {
        return(list(cost = X[i, S]   X[S, 1], path = c(path, S)))
    }
    vp <- Inf
    for (k in S) {
        r <- Recall(k, setdiff(S, k), c(path, k))
        v <- X[i, k]   r$cost
        if (v <= vp) {
            vp <- v
            l <- list(cost = v, path = r$path)
        }
    }
    l
}

which gives

> f(1)
$cost
[1] -4.507312

$path
[1] 1 5 3 4 2

where

f(1) means that the start/end point is 1
$cost is the min sum cost
$path is the column indices that describe the Hamilton path