Home > Net >  SQL aggregation based on time and column change
SQL aggregation based on time and column change

Time:03-21

Have a situation where I'm trying to calculate start, end time and time spent in different areas in a location.

I have the data with timestamp and the location of the person when captured by the system.

The normal case is when a location change and in that case, the end time should be the previous seen value. The exception is when a person is not seen for 5m or more, in which case the endTime shall be the last seen (see above the line 2 and 3 of the aggregation output required).

Original data

date, zone
8h10m, room1
8h12m, room1
8h15m, hall
8h16m, hall
8h25m, hall
8h29m, hall
8h30m, room2
8h34m, room2
8h38m, room2
8h42m, room2

Aggregation/Summary required in the following way (or similar):

startDate, endDate, time, zone
8h10m, 8h12m, 3m, room1
8h15m, 8h16m, 2m, hall   <-- special case time >5m
8h25m, 8h29m, 5m, hall
8h30, 8h42m, 9n, room2

Can you give me an idea of how can I make such "aggregation/summary" in SQL? I'm using BigQuery, but I believe standard SQL should do the job.

Thanks,

Rui

CodePudding user response:

Consider below approach

select 
  min(date) as startDate, max(date) as endDate, 
  time_diff(max(date), min(date), minute)   1 as time, zone
from (
  select *, countif(new_zone) over (partition by zone order by date) as zone_number
  from (
    select *, 
      ifnull(date - lag(date) over (partition by zone order by date) > make_interval(minute => 5)
        or zone != lag(zone) over(order by date), true) as new_zone
    from your_table
  )
)
group by zone, zone_number        

if applied to sample data in your question

with your_table as (
  select time "8:10:00" as date, "room1" as zone union all 
  select "8:12:00", "room1" union all 
  select "8:15:00", "hall" union all 
  select "8:16:00", "hall" union all 
  select "8:25:00", "hall" union all 
  select "8:29:00", "hall" union all 
  select "8:30:00", "room2" union all 
  select "8:34:00", "room2" union all 
  select "8:38:00", "room2" union all 
  select "8:42:00", "room2"
)           

output is

enter image description here

CodePudding user response:

From Mikhail Berlyant solution, the concept with countif has been used to simplify this query. This answer is capable of identifing every move, also if the person re-eneters a room withhin 5 minutes. See addition data provided in the table.

There are several step needed:

  • add lines with zones --- when 5 minutes no data is there: Set over_5:minutes true when difference from previous line (lag) to current line is over 5 minutes. unnest([0,1]) as x to duplicate dataset and qualify to include dataset in this case.
  • sort all of the following statements by the column date, x in the over(order by date, x)
  • with lag obtain the last room and the last date. Because of the unnest x, look two rows back.
  • compare the last room to the current one, if they differ, set the zone_change to true.
  • countif(zone_change) from 1st to current date to obtain the zone_id. This curresponds to a single zone.
  • To this zone_id count the time the x was 0; this is the case when 5 minutes no position was given.
  • group by zone_id and calculate the min and max date
  • remove the --- zones by filtering
With tbl as
(
SELECT TIME "8:10:00" as date, "room1" as zone
UNION ALL SELECT TIME "8:12:00", "room1"
UNION ALL SELECT TIME "8:15:00", "hall"
UNION ALL SELECT TIME "8:16:00", "hall"
UNION ALL SELECT TIME "8:25:00", "hall"
UNION ALL SELECT TIME "8:29:00", "hall"
UNION ALL SELECT TIME "8:30:00", "room2"
UNION ALL SELECT TIME "8:34:00", "room2"
UNION ALL SELECT TIME "8:38:00", "room2"
UNION ALL SELECT TIME "8:42:00", "room2"
UNION ALL SELECT TIME "8:43:00", "hall"
UNION ALL SELECT TIME "8:44:00", "room2"
)

SELECT 
zone_id,
zone,
min(date) as startDate,
max(date) as endDate,
time_diff(max(date),min(date),minute) 1 as time_minutes
FROM
(
SELECT *,
countif(x=0)  over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
countif(zone_change) over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as zone_id
FROM
(
SELECT date,x,if(x=1,zone,"---") as zone,
time_diff(date,lag(date,2) over (order by date),minute)>5 as over_5_minutes,
zone!=lag(zone,2) over (order by date,x) as zone_change
FROM tbl, unnest([0,1]) as x
Qualify over_5_minutes or x=1
)
)
where zone!="---"
group by 1,2
order by 1
  • Related