Home > Net >  Identifying Smallest Element in Each Row of a Matrix
Identifying Smallest Element in Each Row of a Matrix

Time:03-31

I have this "cost matrix" in R that represents the "cost" of going from any location to any other location (for a total of 5 locations):

X<-matrix(rnorm(25) , nrow = 5)

rownames(X) <- colnames(X) <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")

               Location 1  Location 2  Location 3 Location 4 Location 5
Location 1  0.4501251  2.30029903 -0.26950735  0.1723589  0.5045694
Location 2  1.1208198  1.38557818  0.25250596 -0.6174514 -0.5324785
Location 3  0.4181011  0.01103208  0.83713132 -0.7649082 -0.5619196
Location 4  0.9372365 -1.04258420  0.08397031  0.1611555  1.8356483
Location 5  1.0201278 -0.56020913  1.14815210  1.0362332 -2.2052776

I would like to find out the "Greedy Shortest Path" that starts from "Location 1" and ends at "Location 1" while visiting each location exactly once.

I think this would look something like this (R getting the minimum value for each row in a matrix, and returning the row and column name) - this code returns the smallest value in each row of the matrix:

result <- t(sapply(seq(nrow(X)), function(i) {
  j <- which.min(X[i,])
  c(paste(rownames(X)[i], colnames(X)[j], sep='/'), X[i,j])
}))

When I look at the results:

print(result)


     [,1]                    [,2]                
[1,] "Location 1/Location 3" "-0.269507349140081"
[2,] "Location 2/Location 4" "-0.617451368699149"
[3,] "Location 3/Location 4" "-0.764908186347014"
[4,] "Location 4/Location 2" "-1.04258420123991" 
[5,] "Location 5/Location 5" "-2.20527763537575" 

I think this is telling me that the "Greedy Shortest Path" (starting from "Location 1") is : 1 to 3, 3 to 4, 4 to 2, 2 to 4 ... but then I get stuck in a "2 to 4, 4 to 2" loop for ever.

  • Can someone please show me how I can find the "Greedy Shortest Path" that starts from "Location 1"?

By doing this manually:

  • Starting at Location 1, the "shortest greedy path" is to Location 4
  • From Location 4, the "shortest greedy path" is to Location 3
  • From Location 3, the "shortest greedy path" is to Location 5
  • From Location 5, the "shortest greedy path" is to Location 2 (since we have already been to Location 3 and Location 4, and we can not re-visit the current Location i.e. Location 5, and can not visit Location 1 since there is still a Location we haven't visited)
  • From Location 2, we now have no choice but to return to Location 1 and finish the journey

I would look to produce the following output:

Path : (1,4), (4,3), (3,5), (5,2), (2,1)
Total Distance = -0.8441315   (-0.7244259)   (-0.3775706)   0.3796208   0.3015059 =  -1.265001
  • Could someone please show me how to modify my code to get this final output?

Thank you!

CodePudding user response:

This keeps track of visited locations and doesn't check them:

set.seed(123)

n <- 5L
X <- matrix(rnorm(n^2), nrow = n)

rownames(X) <- colnames(X) <- paste("Location", 1:n)

shortest_path <- function(x, start = 1L) {
  n <- nrow(x)
  nn <- 1:n
  used <- c(start, integer(n - 1L))

  for (step in 2:n) {
    used[step] <- nn[-used][which.min(x[used[step - 1L], -used])]
  }
  
  data.frame(path = colnames(x)[used], dist = c(0, x[used[1:(n - 1L)]   n*(used[2:n] - 1L)]))
}

df <- shortest_path(X)
X
#>             Location 1 Location 2 Location 3 Location 4 Location 5
#> Location 1 -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237
#> Location 2 -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749
#> Location 3  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044
#> Location 4  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912
#> Location 5  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
df
#>         path       dist
#> 1 Location 1  0.0000000
#> 2 Location 5 -1.0678237
#> 3 Location 3 -0.5558411
#> 4 Location 4 -1.9666172
#> 5 Location 2 -0.6868529

CodePudding user response:

This seems to be a typical Traveling Salesman Problem (TSP), and I believe you can find a bunch of implementation methods.


Here is a base R option by defining a recursive function like below (borrow data from @jblood94's answer)

f <- function(i, S = setdiff(1:ncol(X), i), path = i) {
    if (length(S) == 1) {
        return(list(cost = X[i, S]   X[S, 1], path = c(path, S)))
    }
    vp <- Inf
    for (k in S) {
        r <- Recall(k, setdiff(S, k), c(path, k))
        v <- X[i, k]   r$cost
        if (v <= vp) {
            vp <- v
            l <- list(cost = v, path = r$path)
        }
    }
    l
}

which gives

> f(1)
$cost
[1] -4.507312

$path
[1] 1 5 3 4 2

where

  • f(1) means that the start/end point is 1
  • $cost is the min sum cost
  • $path is the column indices that describe the Hamilton path
  • Related