I have this "cost matrix" in R that represents the "cost" of going from any location to any other location (for a total of 5 locations):
X<-matrix(rnorm(25) , nrow = 5)
rownames(X) <- colnames(X) <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")
Location 1 Location 2 Location 3 Location 4 Location 5
Location 1 0.4501251 2.30029903 -0.26950735 0.1723589 0.5045694
Location 2 1.1208198 1.38557818 0.25250596 -0.6174514 -0.5324785
Location 3 0.4181011 0.01103208 0.83713132 -0.7649082 -0.5619196
Location 4 0.9372365 -1.04258420 0.08397031 0.1611555 1.8356483
Location 5 1.0201278 -0.56020913 1.14815210 1.0362332 -2.2052776
I would like to find out the "Greedy Shortest Path" that starts from "Location 1" and ends at "Location 1" while visiting each location exactly once.
I think this would look something like this (R getting the minimum value for each row in a matrix, and returning the row and column name) - this code returns the smallest value in each row of the matrix:
result <- t(sapply(seq(nrow(X)), function(i) {
j <- which.min(X[i,])
c(paste(rownames(X)[i], colnames(X)[j], sep='/'), X[i,j])
}))
When I look at the results:
print(result)
[,1] [,2]
[1,] "Location 1/Location 3" "-0.269507349140081"
[2,] "Location 2/Location 4" "-0.617451368699149"
[3,] "Location 3/Location 4" "-0.764908186347014"
[4,] "Location 4/Location 2" "-1.04258420123991"
[5,] "Location 5/Location 5" "-2.20527763537575"
I think this is telling me that the "Greedy Shortest Path" (starting from "Location 1") is : 1 to 3, 3 to 4, 4 to 2, 2 to 4 ... but then I get stuck in a "2 to 4, 4 to 2" loop for ever.
- Can someone please show me how I can find the "Greedy Shortest Path" that starts from "Location 1"?
By doing this manually:
- Starting at Location 1, the "shortest greedy path" is to Location 4
- From Location 4, the "shortest greedy path" is to Location 3
- From Location 3, the "shortest greedy path" is to Location 5
- From Location 5, the "shortest greedy path" is to Location 2 (since we have already been to Location 3 and Location 4, and we can not re-visit the current Location i.e. Location 5, and can not visit Location 1 since there is still a Location we haven't visited)
- From Location 2, we now have no choice but to return to Location 1 and finish the journey
I would look to produce the following output:
Path : (1,4), (4,3), (3,5), (5,2), (2,1)
Total Distance = -0.8441315 (-0.7244259) (-0.3775706) 0.3796208 0.3015059 = -1.265001
- Could someone please show me how to modify my code to get this final output?
Thank you!
CodePudding user response:
This keeps track of visited locations and doesn't check them:
set.seed(123)
n <- 5L
X <- matrix(rnorm(n^2), nrow = n)
rownames(X) <- colnames(X) <- paste("Location", 1:n)
shortest_path <- function(x, start = 1L) {
n <- nrow(x)
nn <- 1:n
used <- c(start, integer(n - 1L))
for (step in 2:n) {
used[step] <- nn[-used][which.min(x[used[step - 1L], -used])]
}
data.frame(path = colnames(x)[used], dist = c(0, x[used[1:(n - 1L)] n*(used[2:n] - 1L)]))
}
df <- shortest_path(X)
X
#> Location 1 Location 2 Location 3 Location 4 Location 5
#> Location 1 -0.56047565 1.7150650 1.2240818 1.7869131 -1.0678237
#> Location 2 -0.23017749 0.4609162 0.3598138 0.4978505 -0.2179749
#> Location 3 1.55870831 -1.2650612 0.4007715 -1.9666172 -1.0260044
#> Location 4 0.07050839 -0.6868529 0.1106827 0.7013559 -0.7288912
#> Location 5 0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
df
#> path dist
#> 1 Location 1 0.0000000
#> 2 Location 5 -1.0678237
#> 3 Location 3 -0.5558411
#> 4 Location 4 -1.9666172
#> 5 Location 2 -0.6868529
CodePudding user response:
This seems to be a typical Traveling Salesman Problem (TSP), and I believe you can find a bunch of implementation methods.
Here is a base R option by defining a recursive function like below (borrow data from @jblood94's answer)
f <- function(i, S = setdiff(1:ncol(X), i), path = i) {
if (length(S) == 1) {
return(list(cost = X[i, S] X[S, 1], path = c(path, S)))
}
vp <- Inf
for (k in S) {
r <- Recall(k, setdiff(S, k), c(path, k))
v <- X[i, k] r$cost
if (v <= vp) {
vp <- v
l <- list(cost = v, path = r$path)
}
}
l
}
which gives
> f(1)
$cost
[1] -4.507312
$path
[1] 1 5 3 4 2
where
f(1)
means that the start/end point is1
$cost
is the min sum cost$path
is the column indices that describe the Hamilton path