my question is about applying a complicated function to every row of a table.
I'm trying to find the traveling time and route of some pairs of points using the osrm package in r (https://cran.r-project.org/web/packages/osrm/osrm.pdf). My data looks like this - each row is a pair of origin-destination points:
ID_o | ID_d | longitude_o | latitude_o | longitude_d | latitude_d |
---|---|---|---|---|---|
1 | 5 | -122.2925 | 47.72932 | -122.2820 | 47.73027 |
2 | 6 | -122.2820 | 47.73027 | -122.2944 | 47.72293 |
3 | 7 | -122.3365 | 47.72512 | -122.3153 | 47.71490 |
4 | 8 | -122.3264 | 47.70752 | -122.3151 | 47.70674 |
I can use the function in osrm to obtain the route for any one row
time.route1 <- osrmRoute(src = mydata[1, c('longitude_o', 'latitude_o')],
dst = mydata[1, c('longitude_d', 'latitude_d')],
returnclass = "sf")
I can also write a loop to compute what I need for multiple rows
time.route2 <- data.frame(matrix(, nrow=4, ncol=5))
for (ix in c(1:4) ) {
route.temp <- osrmRoute(src = mydata[ix, c('longitude_o', 'latitude_o')],
dst = mydata[ix, c('longitude_d', 'latitude_d')],
returnclass = "sf")
time.route2[ix, ] <- route
}
in which I simply apply the function to each row sequentially. But loop runs slow (I have millions of rows) and stops unexpectedly when there is an NA in my raw data. And it's clear that the computation of one row has nothing to do with all the others. So it's possible to do them simultaneously.
Is there a way to do parallel computing on each row at the same time? Using apply
or map
function or other methods? Simple examples of apply
and map
function doesn't help since osrmRoute
is a quite complicated function.
I tried the following
biroute <- function(geofile, ix=1) {
osrmRoute(src = geofile[ix, c('longitude_o', 'latitude_o')],
dst = geofile[ix, c('longitude_d', 'latitude_d')])
}
route <- apply(mydata, 1, biroute)
but an error occurs when executing the osrmRoute
function saying "incorrect number of dimensions".
CodePudding user response:
I don't know if this will give you a great improvement, but you could try:
biroute <- function(geofile) {
osrmRoute(src = geofile[c('longitude_o', 'latitude_o')],
dst = geofile[c('longitude_d', 'latitude_d')])
}
apply(mydata, 1, biroute)
For your shown example this returns
[[1]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -122.2924 ymin: 47.72932 xmax: -122.282 ymax: 47.73631
Geodetic CRS: WGS 84
src dst duration distance geometry
src_dst src dst 4.258333 2.1745 LINESTRING (-122.2923 47.72...
[[2]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -122.2944 ymin: 47.72289 xmax: -122.282 ymax: 47.73629
Geodetic CRS: WGS 84
src dst duration distance geometry
src_dst src dst 6.233333 3.0681 LINESTRING (-122.2821 47.73...
[[3]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -122.3364 ymin: 47.7149 xmax: -122.3153 ymax: 47.72686
Geodetic CRS: WGS 84
src dst duration distance geometry
src_dst src dst 6.058333 2.7979 LINESTRING (-122.3363 47.72...
[[4]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -122.3264 ymin: 47.70674 xmax: -122.3151 ymax: 47.7086
Geodetic CRS: WGS 84
src dst duration distance geometry
src_dst src dst 2.903333 1.139 LINESTRING (-122.3264 47.70...
CodePudding user response:
One option is to wrap your custom function in purrr::safely
to capture errors without stopping the function. You can use the furrr
package to run parallel.
A custom function and the possibly
wrapper
biroute <- function(longitude_o, latitude_o, longitude_d, latitude_d) {
osrm::osrmRoute(src = c(longitude_o, latitude_o),
dst = c(longitude_d, latitude_d))
}
biroute_possibly <- purrr::possibly(biroute, NA)
And then apply that function using parallel processing. If you have a computer with lots of cores, you can increase workers
to take advantage.
library(furrr)
plan(multisession, workers = 2)
future_pmap(mydata[,-c(1:2)], biroute_possibly)