My goal is to apply the st_distance function to a very large data frame, yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.
I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.
Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements.
The st_distance function I plan to use looks something like, it it would be applied to the full data frame, not split into a list:
` st_distance(df[,-nrow(df)], df[-1,], by_element=TRUE)
I've tried the following but it does not work;
sapply(1:nrow(list),
function(i){
st_distance(
list[i,-nrow(list)],
list[-1], by_element=TRUE)
)
}
)
Here's a small example dataset
ID Date Time Datetime Long Lat
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232
CodePudding user response:
Here is an basic solution. I split the original data into multiple data frames using split and then wrapped the distance function in lapply()
.
data <- read.table(header=TRUE, text="ID Date Time Datetime time2 Long Lat
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232")
#EPSG:32615 32615
library(sf)
library(magrittr)
dfs <- split(data, data$ID)
answer <- lapply(dfs, function(df) {
#convert to a sf oject and specify coordinate systems
start <- df[-1 , c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
end <- df[-nrow(df), c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
#long_lat <-st_transform(start, 4326)
distances <-sf::st_distance(start, end, by_element = TRUE)
df$distances <- c(NA, distances)
df
})
answer
$`10_17`
ID Date Time Datetime time2 Long Lat distances
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 NA
2 10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409.0 5179885 3777.132
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 1937.282
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 6201.824
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 3471.400
$`10_24`
ID Date Time Datetime time2 Long Lat distances
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 NA
6 10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 218.6377
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 275.9153
There should be an easier way to calculate distances between rows instead of creating 2 series of points.
Referenced: Converting table columns to spatial objects