I have a dataframe of many GPS data. The df contains timestamp, duration, latitude, longitude and altitude.
Record Time Duration Longitude Latitude Altitude
07/04/2020 10:00:00 00:00.00 08.32436 45.53638 85.4
07/04/2020 10:01:00 00:01.00 08.32436 45.53638 85.4
07/04/2020 10:02:00 00:02.00 08.32436 45.53638 85.4
A truck delivers goods from point A to point B. I want to count the amount of finished journeys from point A to point B. The truck has to stay a certain time period within the area of point A and B.
I thought of defining the areas and find out the occurences within the areas. Somehting like this can define the area of a point and count the GPS points:
sum(df$Longitude>=8.30 & df$Longitude<=8.35 & df$Latitude<=45.50 & df$Latitude>=45.55 & df$Altitude>=85.35 & df$Altitude<=85.45)
But this gives the amount of all GPS points in the defined area. I want to count only one occurence for a group of GPS points according to a stay. For example, if the truck stays 10 minutes in the area, I want to count one occurence and not all GPS data points. The staying time is each stay different. Because of this not a fixed duration can be included. I was thinking, if there is one occurence for the area of point A and afterwards an occurence in the area of point B, this can be accounted together as one journey. Perfect it would be, to have a table at the result, which indicates the amount of finished journeys and start and end time of each journey. Results should be look like this:
Amount journeys: 3
Journey Start End
1 10:00 11:00
2 12:00 13:00
3 14:00 15:00
CodePudding user response:
Using data.table
:
setDT(df)[setDT(df.areas), on = c("Longitude>=min_long", "Longitude<max_long", "Latitude>=min_lat", "Latitude<max_lat", "Altitude>=min_alt", "Altitude<max_alt"),
Code := Code]
dt <- df[!is.na(Code), .(End = min(`Record Time`), Start = max(`Record Time`)), by=rleid(Code)]
dt[, Start := shift(Start, type="lag")]
dt[, Journey := rleid - 1]
Which produces:
Journey Start End
1: 1 2020-04-07 10:05:00 2020-04-07 10:06:00
2: 2 2020-04-07 10:11:00 2020-04-07 10:16:00
4: 3 2020-04-07 10:19:00 2020-04-07 10:20:00
Data:
dput(df)
structure(list(`Record Time` = structure(c(1586246400, 1586246460,
1586246520, 1586246580, 1586246640, 1586246700, 1586246760, 1586246820,
1586246880, 1586246940, 1586247000, 1586247060, 1586247120, 1586247180,
1586247240, 1586247300, 1586247360, 1586247420, 1586247480, 1586247540,
1586247600, 1586247660, 1586247720), class = c("POSIXct", "POSIXt"
), tzone = ""), Duration = c(" 00:00.00", " 00:01.00", " 00:02.00",
" 00:03.00", " 00:04.00", " 00:05.00", " 00:00.00", " 00:01.00",
" 00:02.00", " 00:03.00", " 00:04.00", " 00:05.00", " 00:00.00",
" 00:01.00", " 00:02.00", " 00:03.00", " 00:00.00", " 00:01.00",
" 00:02.00", " 00:03.00", " 00:00.00", " 00:01.00", " 00:02.00"
), Longitude = c(8.32436, 8.32436, 8.32436, 8.32436, 8.32436,
8.32436, 8.29563, 8.29563, 8.29563, 8.29563, 8.29563, 8.29563,
8.37429, 8.37429, 8.37429, 8.37429, 8.32436, 8.32436, 8.32436,
8.32436, 8.42005, 8.42005, 8.42005), Latitude = c(45.53638, 45.53638,
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638,
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638,
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638
), Altitude = c(85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4,
85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4,
85.4, 85.4, 85.4, 85.4)), row.names = c(NA, -23L), class = "data.frame")
dput(df.areas)
structure(list(Code = c("A", "B", "C"), min_long = c(8.3, 8.25,
8.4), max_long = c(8.35, 8.3, 8.45), min_lat = c(45.5, 45.5,
45.5), max_lat = c(45.55, 45.55, 45.55), min_alt = c(85.35, 85.35,
85.35), max_alt = c(85.45, 85.45, 85.45)), class = c("data.table",
"data.frame"), row.names = c(NA, -3L), .internal.selfref = <pointer: 0x000001f1ee951ef0>)
CodePudding user response:
Ok, given that you know the area locations, I think we can make the assumption that a journey is finished, when a truck has been at two areas. In that case, we need neither altitude nor duration in the dataset, but something like a driver- or truck-id, which I just presume for now:
driver_id timestamp longitude latitude
1 1 2020-07-04 06:00:00 20 40
2 1 2020-07-04 07:00:00 21 40
3 1 2020-07-04 08:00:00 22 39
4 2 2020-07-04 07:00:00 4 30
5 2 2020-07-04 12:00:00 9 32
6 3 2020-07-04 15:00:00 10 10
Secondly, you have the area-data:
area_id longitude latitude
1 A 20 40
2 B 22 39
3 C 4 30
4 D 9 32
5 E 10 10
How to proceed?
1 - Join the area with the journey data, so that for each row we know, if the truck is at an area or not:
> journeys_with_areas <- journeys %>% left_join(areas)
> journeys_with_areas
driver_id timestamp longitude latitude area_id
1 1 2020-07-04 06:00:00 20 40 A
2 1 2020-07-04 07:00:00 21 40 <NA>
3 1 2020-07-04 08:00:00 22 39 B
4 2 2020-07-04 07:00:00 4 30 C
5 2 2020-07-04 12:00:00 9 32 D
6 3 2020-07-04 15:00:00 10 10 E
2 - Filter out locations that are not an area
> journeys_with_areas <- journeys_with_areas %>%
> filter(!is.na(area_id))
3 - Filter out unfinished journeys
> journeys_with_areas <- journeys_with_areas %>%
> group_by(driver_id) %>% filter(n() > 1)
4 - Add start- and end-time to each row per journey and reduce data.frame as you wished
> journeys_with_areas %>%
> mutate(start = min(timestamp), stop = max(timestamp)) %>%
> select(driver_id, start, stop) %>% unique
driver_id start stop
<fct> <dttm> <dttm>
1 1 2020-07-04 06:00:00 2020-07-04 08:00:00
2 2 2020-07-04 07:00:00 2020-07-04 12:00:00