count occurence of events in GPS data R-CodePudding

I have a dataframe of many GPS data. The df contains timestamp, duration, latitude, longitude and altitude.

Record Time          Duration   Longitude   Latitude    Altitude
07/04/2020 10:00:00  00:00.00   08.32436    45.53638    85.4
07/04/2020 10:01:00  00:01.00   08.32436    45.53638    85.4
07/04/2020 10:02:00  00:02.00   08.32436    45.53638    85.4

A truck delivers goods from point A to point B. I want to count the amount of finished journeys from point A to point B. The truck has to stay a certain time period within the area of point A and B.

I thought of defining the areas and find out the occurences within the areas. Somehting like this can define the area of a point and count the GPS points:

sum(df$Longitude>=8.30 & df$Longitude<=8.35 & df$Latitude<=45.50 & df$Latitude>=45.55 & df$Altitude>=85.35 & df$Altitude<=85.45)

But this gives the amount of all GPS points in the defined area. I want to count only one occurence for a group of GPS points according to a stay. For example, if the truck stays 10 minutes in the area, I want to count one occurence and not all GPS data points. The staying time is each stay different. Because of this not a fixed duration can be included. I was thinking, if there is one occurence for the area of point A and afterwards an occurence in the area of point B, this can be accounted together as one journey. Perfect it would be, to have a table at the result, which indicates the amount of finished journeys and start and end time of each journey. Results should be look like this:

Amount journeys: 3
Journey Start   End
1      10:00    11:00
2      12:00    13:00
3      14:00    15:00

CodePudding user response：

Using data.table:

setDT(df)[setDT(df.areas), on = c("Longitude>=min_long", "Longitude<max_long", "Latitude>=min_lat", "Latitude<max_lat", "Altitude>=min_alt", "Altitude<max_alt"), 
           Code := Code]

dt <- df[!is.na(Code), .(End = min(`Record Time`), Start = max(`Record Time`)), by=rleid(Code)]
dt[, Start := shift(Start, type="lag")]
dt[, Journey := rleid - 1]

Which produces:

   Journey               Start                 End
1:       1 2020-04-07 10:05:00 2020-04-07 10:06:00
2:       2 2020-04-07 10:11:00 2020-04-07 10:16:00
4:       3 2020-04-07 10:19:00 2020-04-07 10:20:00

Data:

dput(df)
structure(list(`Record Time` = structure(c(1586246400, 1586246460, 
1586246520, 1586246580, 1586246640, 1586246700, 1586246760, 1586246820, 
1586246880, 1586246940, 1586247000, 1586247060, 1586247120, 1586247180, 
1586247240, 1586247300, 1586247360, 1586247420, 1586247480, 1586247540, 
1586247600, 1586247660, 1586247720), class = c("POSIXct", "POSIXt"
), tzone = ""), Duration = c("  00:00.00", "  00:01.00", "  00:02.00", 
"  00:03.00", "  00:04.00", "  00:05.00", "  00:00.00", "  00:01.00", 
"  00:02.00", "  00:03.00", "  00:04.00", "  00:05.00", "  00:00.00", 
"  00:01.00", "  00:02.00", "  00:03.00", "  00:00.00", "  00:01.00", 
"  00:02.00", "  00:03.00", "  00:00.00", "  00:01.00", "  00:02.00"
), Longitude = c(8.32436, 8.32436, 8.32436, 8.32436, 8.32436, 
8.32436, 8.29563, 8.29563, 8.29563, 8.29563, 8.29563, 8.29563, 
8.37429, 8.37429, 8.37429, 8.37429, 8.32436, 8.32436, 8.32436, 
8.32436, 8.42005, 8.42005, 8.42005), Latitude = c(45.53638, 45.53638, 
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 
45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638, 45.53638
), Altitude = c(85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 
85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 85.4, 
85.4, 85.4, 85.4, 85.4)), row.names = c(NA, -23L), class = "data.frame")

dput(df.areas)
structure(list(Code = c("A", "B", "C"), min_long = c(8.3, 8.25, 
8.4), max_long = c(8.35, 8.3, 8.45), min_lat = c(45.5, 45.5, 
45.5), max_lat = c(45.55, 45.55, 45.55), min_alt = c(85.35, 85.35, 
85.35), max_alt = c(85.45, 85.45, 85.45)), class = c("data.table", 
"data.frame"), row.names = c(NA, -3L), .internal.selfref = <pointer: 0x000001f1ee951ef0>)

CodePudding user response：

Ok, given that you know the area locations, I think we can make the assumption that a journey is finished, when a truck has been at two areas. In that case, we need neither altitude nor duration in the dataset, but something like a driver- or truck-id, which I just presume for now:

  driver_id           timestamp longitude latitude
1         1 2020-07-04 06:00:00        20       40
2         1 2020-07-04 07:00:00        21       40
3         1 2020-07-04 08:00:00        22       39
4         2 2020-07-04 07:00:00         4       30
5         2 2020-07-04 12:00:00         9       32
6         3 2020-07-04 15:00:00        10       10

Secondly, you have the area-data:

  area_id longitude latitude
1       A        20       40
2       B        22       39
3       C         4       30
4       D         9       32
5       E        10       10

How to proceed?

1 - Join the area with the journey data, so that for each row we know, if the truck is at an area or not:

> journeys_with_areas <- journeys %>% left_join(areas) 
> journeys_with_areas

  driver_id           timestamp longitude latitude area_id
1         1 2020-07-04 06:00:00        20       40       A
2         1 2020-07-04 07:00:00        21       40    <NA>
3         1 2020-07-04 08:00:00        22       39       B
4         2 2020-07-04 07:00:00         4       30       C
5         2 2020-07-04 12:00:00         9       32       D
6         3 2020-07-04 15:00:00        10       10       E

2 - Filter out locations that are not an area

> journeys_with_areas <- journeys_with_areas %>% 
>    filter(!is.na(area_id))

3 - Filter out unfinished journeys

> journeys_with_areas <- journeys_with_areas %>% 
>     group_by(driver_id) %>% filter(n() > 1)

4 - Add start- and end-time to each row per journey and reduce data.frame as you wished

> journeys_with_areas %>% 
>     mutate(start = min(timestamp), stop = max(timestamp)) %>%
>     select(driver_id, start, stop) %>% unique

  driver_id start               stop               
  <fct>     <dttm>              <dttm>             
1 1         2020-07-04 06:00:00 2020-07-04 08:00:00
2 2         2020-07-04 07:00:00 2020-07-04 12:00:00