Home > Software engineering >  How to apply an osrm function to every row of a dataframe
How to apply an osrm function to every row of a dataframe

Time:11-19

my question is about applying a complicated function to every row of a table.

I'm trying to find the traveling time and route of some pairs of points using the osrm package in r (https://cran.r-project.org/web/packages/osrm/osrm.pdf). My data looks like this - each row is a pair of origin-destination points:

ID_o ID_d longitude_o latitude_o longitude_d latitude_d
1 5 -122.2925 47.72932 -122.2820 47.73027
2 6 -122.2820 47.73027 -122.2944 47.72293
3 7 -122.3365 47.72512 -122.3153 47.71490
4 8 -122.3264 47.70752 -122.3151 47.70674

I can use the function in osrm to obtain the route for any one row

time.route1 <- osrmRoute(src = mydata[1, c('longitude_o', 'latitude_o')],
                         dst = mydata[1, c('longitude_d', 'latitude_d')],
                         returnclass = "sf")

I can also write a loop to compute what I need for multiple rows

time.route2 <- data.frame(matrix(, nrow=4, ncol=5))
for (ix in c(1:4) ) {
  route.temp <- osrmRoute(src = mydata[ix, c('longitude_o', 'latitude_o')],
                          dst = mydata[ix, c('longitude_d', 'latitude_d')],
                          returnclass = "sf")
  time.route2[ix, ] <- route
}

in which I simply apply the function to each row sequentially. But loop runs slow (I have millions of rows) and stops unexpectedly when there is an NA in my raw data. And it's clear that the computation of one row has nothing to do with all the others. So it's possible to do them simultaneously.

Is there a way to do parallel computing on each row at the same time? Using apply or map function or other methods? Simple examples of apply and map function doesn't help since osrmRoute is a quite complicated function.

I tried the following

biroute <- function(geofile, ix=1) {
  osrmRoute(src = geofile[ix, c('longitude_o', 'latitude_o')],
            dst = geofile[ix, c('longitude_d', 'latitude_d')])
}
route <- apply(mydata, 1, biroute)

but an error occurs when executing the osrmRoute function saying "incorrect number of dimensions".

CodePudding user response:

I don't know if this will give you a great improvement, but you could try:

biroute <- function(geofile) {
  osrmRoute(src = geofile[c('longitude_o', 'latitude_o')],
            dst = geofile[c('longitude_d', 'latitude_d')])
}

apply(mydata, 1, biroute)

For your shown example this returns

[[1]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -122.2924 ymin: 47.72932 xmax: -122.282 ymax: 47.73631
Geodetic CRS:  WGS 84
        src dst duration distance                       geometry
src_dst src dst 4.258333   2.1745 LINESTRING (-122.2923 47.72...

[[2]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -122.2944 ymin: 47.72289 xmax: -122.282 ymax: 47.73629
Geodetic CRS:  WGS 84
        src dst duration distance                       geometry
src_dst src dst 6.233333   3.0681 LINESTRING (-122.2821 47.73...

[[3]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -122.3364 ymin: 47.7149 xmax: -122.3153 ymax: 47.72686
Geodetic CRS:  WGS 84
        src dst duration distance                       geometry
src_dst src dst 6.058333   2.7979 LINESTRING (-122.3363 47.72...

[[4]]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -122.3264 ymin: 47.70674 xmax: -122.3151 ymax: 47.7086
Geodetic CRS:  WGS 84
        src dst duration distance                       geometry
src_dst src dst 2.903333    1.139 LINESTRING (-122.3264 47.70...

CodePudding user response:

One option is to wrap your custom function in purrr::safely to capture errors without stopping the function. You can use the furrr package to run parallel.

A custom function and the possibly wrapper

biroute <- function(longitude_o, latitude_o, longitude_d, latitude_d) {
  osrm::osrmRoute(src = c(longitude_o, latitude_o),
                  dst = c(longitude_d, latitude_d))
}

biroute_possibly <- purrr::possibly(biroute, NA)

And then apply that function using parallel processing. If you have a computer with lots of cores, you can increase workers to take advantage.

library(furrr)
plan(multisession, workers = 2)

future_pmap(mydata[,-c(1:2)], biroute_possibly)
  • Related