Home > Back-end >  How to loop st_distance through list
How to loop st_distance through list

Time:07-10

My goal is to apply the st_distance function to a very large data frame, yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.

I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.

Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements.

The st_distance function I plan to use looks something like, it it would be applied to the full data frame, not split into a list:

` st_distance(df[,-nrow(df)], df[-1,], by_element=TRUE)

I've tried the following but it does not work;

sapply(1:nrow(list), 
       function(i){
             st_distance(
             list[i,-nrow(list)],
             list[-1], by_element=TRUE)
         )
       }
)

Here's a small example dataset

ID      Date        Time        Datetime        Long        Lat
10_17   4/18/2017   15:02:00    4/18/2017 15:02 379800.5    5181001
10_17   4/20/2017   6:00:00     4/20/2017 6:00  383409      5179885
10_17   4/21/2017   21:02:00    4/21/2017 21:02 383191.2    5177960
10_24   4/22/2017   10:03:00    4/22/2017 10:03 383448.6    5179918
10_17   4/23/2017   12:01:00    4/23/2017 12:01 378582.5    5182110
10_24   4/24/2017   1:00:00     4/24/2017 1:00  383647.4    5180009
10_24   4/25/2017   16:01:00    4/25/2017 16:01 383407.9    5179872
10_17   4/26/2017   18:02:00    4/26/2017 18:02 380691.9    5179353
10_36   4/27/2017   20:00:00    4/27/2017 20:00 382521.9    5175266
10_36   4/29/2017   11:01:00    4/29/2017 11:01 383443.8    5179909
10_36   4/30/2017   0:00:00     4/30/2017 0:00  383060.8    5178361
10_40   4/30/2017   13:02:00    4/30/2017 13:02 383426.3    5179873
10_40   5/2/2017    17:02:00    5/2/2017 17:02  383393.7    5179883
10_40   5/3/2017    6:01:00     5/3/2017 6:01   382875.8    5179376
10_88   5/3/2017    19:02:00    5/3/2017 19:02  383264.3    5179948
10_88   5/4/2017    8:01:00     5/4/2017 8:01   378554.4    5181966
10_88   5/4/2017    21:03:00    5/4/2017 21:03  379830.5    5177232

CodePudding user response:

Here is an basic solution. I split the original data into multiple data frames using split and then wrapped the distance function in lapply().

data <- read.table(header=TRUE, text="ID      Date        Time        Datetime  time2      Long        Lat
10_17   4/18/2017   15:02:00    4/18/2017 15:02 379800.5    5181001
10_17   4/20/2017   6:00:00     4/20/2017 6:00  383409      5179885
10_17   4/21/2017   21:02:00    4/21/2017 21:02 383191.2    5177960
10_24   4/22/2017   10:03:00    4/22/2017 10:03 383448.6    5179918
10_17   4/23/2017   12:01:00    4/23/2017 12:01 378582.5    5182110
10_24   4/24/2017   1:00:00     4/24/2017 1:00  383647.4    5180009
10_24   4/25/2017   16:01:00    4/25/2017 16:01 383407.9    5179872
10_17   4/26/2017   18:02:00    4/26/2017 18:02 380691.9    5179353
10_36   4/27/2017   20:00:00    4/27/2017 20:00 382521.9    5175266
10_36   4/29/2017   11:01:00    4/29/2017 11:01 383443.8    5179909
10_36   4/30/2017   0:00:00     4/30/2017 0:00  383060.8    5178361
10_40   4/30/2017   13:02:00    4/30/2017 13:02 383426.3    5179873
10_40   5/2/2017    17:02:00    5/2/2017 17:02  383393.7    5179883
10_40   5/3/2017    6:01:00     5/3/2017 6:01   382875.8    5179376
10_88   5/3/2017    19:02:00    5/3/2017 19:02  383264.3    5179948
10_88   5/4/2017    8:01:00     5/4/2017 8:01   378554.4    5181966
10_88   5/4/2017    21:03:00    5/4/2017 21:03  379830.5    5177232")

#EPSG:32615 32615
library(sf)
library(magrittr)

dfs <- split(data, data$ID) 

answer <- lapply(dfs, function(df) {
   #convert to a sf oject and specify coordinate systems
   start <- df[-1 , c("Long", "Lat")] %>% 
      st_as_sf(coords = c('Long', 'Lat')) %>%
      st_set_crs(32615)
   
   end <- df[-nrow(df), c("Long", "Lat")] %>% 
      st_as_sf(coords = c('Long', 'Lat')) %>%
      st_set_crs(32615)
   
   #long_lat <-st_transform(start, 4326)
   distances <-sf::st_distance(start, end, by_element = TRUE) 
   
   df$distances <- c(NA, distances)
   df
})

answer
$`10_17`
     ID      Date     Time  Datetime time2     Long     Lat distances
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001     NA
2 10_17 4/20/2017  6:00:00 4/20/2017  6:00 383409.0 5179885  3777.132
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960  1937.282
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110  6201.824
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353  3471.400

$`10_24`
     ID      Date     Time  Datetime time2     Long     Lat distances
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918    NA
6 10_24 4/24/2017  1:00:00 4/24/2017  1:00 383647.4 5180009  218.6377
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872  275.9153

There should be an easier way to calculate distances between rows instead of creating 2 series of points.

Referenced: Converting table columns to spatial objects

  • Related