Home > database >  statistics of sessions with gap-and-island problem
statistics of sessions with gap-and-island problem

Time:09-27

I am investigating the problem of gap-and-island in a log table, matching it with another table to give statistics of sessions with that problem.

Table1: labels table containing the mode of travel used in a session.

REATE TABLE labels(user_id INT, session_id INT,
  start_time TIMESTAMP,mode TEXT);

INSERT INTO labels (user_id,session_id,start_time,mode)
VALUES  (48,652,'2016-04-01 00:47:00 01','foot'),
(9,656,'2016-04-01 00:03:39 01','car'),(9,657,'2016-04-01 00:26:51 01','car'),
(9,658,'2016-04-01 00:45:19 01','car'),(46,663,'2016-04-01 00:13:12 01','car')

Table2: raw sessions' logs stored in raw_data table.

CREATE TABLE raw_data(session_id INT,timestamp TIMESTAMP);

INSERT INTO raw_data(session_id,timestamp)          
VALUES (652,'2016-04-01 00:46:11.638 01'),(652,'2016-04-01 00:47:00.566 01'),
       (652,'2016-04-01 00:48:06.383 01'),(656,'2016-04-01 00:14:17.707 01'),
       (656,'2016-04-01 00:15:18.664 01'),(656,'2016-04-01 00:16:19.687 01'),
       (656,'2016-04-01 00:24:20.691 01'),(656,'2016-04-01 00:25:23.681 01'),
       (657,'2016-04-01 00:24:50.842 01'),(657,'2016-04-01 00:26:51.096 01'),
       (657,'2016-04-01 00:37:54.092 01') 

I want to search each session for to find those having time difference between 2 consecutive rows greater than 5-minutes.

  • I will then report these sessions together with their corresponding mode.

  • I will also give the total number of sessions having the problem.

Note: here is the dbfiddle.

CodePudding user response:

select session_id     
      ,timestamp    
      ,user_id  
      ,start_time   
      ,count(diff) over()/2 as number_of_session_with_problem
from  (
       select *
              ,case when timestamp-lag(timestamp) over(partition by session_id order by timestamp)    > '00:05:00.000' then 1 when lead(timestamp) over(partition by session_id order by timestamp) - timestamp > '00:05:00.000' then 1 end as diff
       from   raw_data join labels using(session_id)
      ) t
where diff = 1
session_id timestamp user_id start_time number_of_session_with_problem
656 2016-04-01 00:16:19.687 9 2016-04-01 00:03:39 2
656 2016-04-01 00:24:20.691 9 2016-04-01 00:03:39 2
657 2016-04-01 00:26:51.096 9 2016-04-01 00:26:51 2
657 2016-04-01 00:37:54.092 9 2016-04-01 00:26:51 2

Fiddle

  • Related