Have a situation where I'm trying to calculate start, end time and time spent in different areas in a location.
I have the data with timestamp and the location of the person when captured by the system.
The normal case is when a location change and in that case, the end time should be the previous seen value. The exception is when a person is not seen for 5m or more, in which case the endTime shall be the last seen (see above the line 2 and 3 of the aggregation output required).
Original data
date, zone
8h10m, room1
8h12m, room1
8h15m, hall
8h16m, hall
8h25m, hall
8h29m, hall
8h30m, room2
8h34m, room2
8h38m, room2
8h42m, room2
Aggregation/Summary required in the following way (or similar):
startDate, endDate, time, zone
8h10m, 8h12m, 3m, room1
8h15m, 8h16m, 2m, hall <-- special case time >5m
8h25m, 8h29m, 5m, hall
8h30, 8h42m, 9n, room2
Can you give me an idea of how can I make such "aggregation/summary" in SQL? I'm using BigQuery, but I believe standard SQL should do the job.
Thanks,
Rui
CodePudding user response:
Consider below approach
select
min(date) as startDate, max(date) as endDate,
time_diff(max(date), min(date), minute) 1 as time, zone
from (
select *, countif(new_zone) over (partition by zone order by date) as zone_number
from (
select *,
ifnull(date - lag(date) over (partition by zone order by date) > make_interval(minute => 5)
or zone != lag(zone) over(order by date), true) as new_zone
from your_table
)
)
group by zone, zone_number
if applied to sample data in your question
with your_table as (
select time "8:10:00" as date, "room1" as zone union all
select "8:12:00", "room1" union all
select "8:15:00", "hall" union all
select "8:16:00", "hall" union all
select "8:25:00", "hall" union all
select "8:29:00", "hall" union all
select "8:30:00", "room2" union all
select "8:34:00", "room2" union all
select "8:38:00", "room2" union all
select "8:42:00", "room2"
)
output is
CodePudding user response:
From Mikhail Berlyant solution, the concept with countif
has been used to simplify this query. This answer is capable of identifing every move, also if the person re-eneters a room withhin 5 minutes. See addition data provided in the table.
There are several step needed:
- add lines with zones
---
when 5 minutes no data is there: Setover_5:minutes
true when difference from previous line (lag
) to current line is over 5 minutes.unnest([0,1]) as x
to duplicate dataset andqualify
to include dataset in this case. - sort all of the following statements by the column
date, x
in theover(order by date, x)
- with
lag
obtain the last room and the last date. Because of the unnest x, look two rows back. - compare the last room to the current one, if they differ, set the
zone_change
to true. countif(zone_change)
from 1st to current date to obtain thezone_id
. This curresponds to a single zone.- To this
zone_id
count the time thex
was 0; this is the case when 5 minutes no position was given. group by zone_id
and calculate the min and max date- remove the
---
zones by filtering
With tbl as
(
SELECT TIME "8:10:00" as date, "room1" as zone
UNION ALL SELECT TIME "8:12:00", "room1"
UNION ALL SELECT TIME "8:15:00", "hall"
UNION ALL SELECT TIME "8:16:00", "hall"
UNION ALL SELECT TIME "8:25:00", "hall"
UNION ALL SELECT TIME "8:29:00", "hall"
UNION ALL SELECT TIME "8:30:00", "room2"
UNION ALL SELECT TIME "8:34:00", "room2"
UNION ALL SELECT TIME "8:38:00", "room2"
UNION ALL SELECT TIME "8:42:00", "room2"
UNION ALL SELECT TIME "8:43:00", "hall"
UNION ALL SELECT TIME "8:44:00", "room2"
)
SELECT
zone_id,
zone,
min(date) as startDate,
max(date) as endDate,
time_diff(max(date),min(date),minute) 1 as time_minutes
FROM
(
SELECT *,
countif(x=0) over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
countif(zone_change) over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as zone_id
FROM
(
SELECT date,x,if(x=1,zone,"---") as zone,
time_diff(date,lag(date,2) over (order by date),minute)>5 as over_5_minutes,
zone!=lag(zone,2) over (order by date,x) as zone_change
FROM tbl, unnest([0,1]) as x
Qualify over_5_minutes or x=1
)
)
where zone!="---"
group by 1,2
order by 1